APS Logo

Exploring the loss landscape with Langevin dynamics

ORAL

Abstract

In supervised learning, neural network training is founded on the minimization of a high-dimensional loss function. A better understanding of its landscape is crucial in designing better-performing learning algorithms. We look to explore the loss landscape of an over-parametrized deep network with numerical experiments. Starting from a global minimum, we study the dynamics of SGD with added random noise that generate competition between diffusion and gradient descent. Most notably, we observe unexpected catastrophic dynamics and investigate how they relate to the value of the hyperparameters, like the learning rate or the batch size, and to the characteristics of the loss landscape.

Presenters

  • Théo Jules

    Raymond and Beverly Sackler School of Physics and Astronomy, Tel Aviv University

Authors

  • Théo Jules

    Raymond and Beverly Sackler School of Physics and Astronomy, Tel Aviv University

  • Yohai Bar-Sinai

    Google LLC, Tel Aviv University