APS Logo

Understanding multi-pass stochastic gradient descent via dynamical mean-field theory

ORAL

Abstract

Artificial neural networks trained via stochastic gradient descent (SGD) have achieved impressive performances. A general consensus has arisen that understanding SGD successful optimization requires a detailed description of the dynamical trajectory traversed during training. Yet this task is highly nontrivial given that SGD follows a nonequilibrium dynamics in a high-dimensional non-convex loss landscape. Thus, the practical success of SGD is still largely unexplained. We have applied dynamical mean-field theory to derive a full description of the learning curves of SGD and of its performances in prototypical models. This is the first work tracking the high-dimensional dynamics of SGD in the realistic case where the network reuses the available examples multiple times. We have also investigated how different sources of algorithmic noise affect the performance. Comparing SGD to gradient descent in an intrinsically hard problem (phase retrieval) we have shown that SGD noise is key to find good solutions. We have found that an effective fluctuation-dissipation theorem characterizes the stationary dynamics of SGD, extracting the related effective temperature as a function of the hyperparameters. These results point out a novel analogy between SGD and active and driven systems.

Publication: - The effective noise of stochastic gradient descent and how local knowledge of partial information drives complex systems, Francesca Mignacco, Pierfrancesco Urbani, Article in preparation.<br><br>- Stochasticity helps to navigate rough landscapes: comparing gradient-descent-based algorithms in the phase retrieval problem, Francesca Mignacco, Pierfrancesco Urbani, Lenka Zdeborova, Machine Learning: Science and Technology, 2021.<br><br>- Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification, Francesca Mignacco, Florent Krzakala, Pierfrancesco Urbani and Lenka Zdeborova, Advances in Neural Information Processing Systems, 2020, vol. 33.<br>To appear in the "Machine Learning 2021'' Special Issue, JSTAT.

Presenters

  • Francesca Mignacco

    Institute of Theoretical Physics, CEA Saclay

Authors

  • Francesca Mignacco

    Institute of Theoretical Physics, CEA Saclay