Exploring the loss landscape with Langevin dynamics
ORAL
Abstract
In supervised learning, neural network training is founded on the minimization of a high-dimensional loss function. A better understanding of its landscape is crucial in designing better-performing learning algorithms. We look to explore the loss landscape of an over-parametrized deep network with numerical experiments. Starting from a global minimum, we study the dynamics of SGD with added random noise that generate competition between diffusion and gradient descent. Most notably, we observe unexpected catastrophic dynamics and investigate how they relate to the value of the hyperparameters, like the learning rate or the batch size, and to the characteristics of the loss landscape.
–
Presenters
-
Théo Jules
Raymond and Beverly Sackler School of Physics and Astronomy, Tel Aviv University
Authors
-
Théo Jules
Raymond and Beverly Sackler School of Physics and Astronomy, Tel Aviv University
-
Yohai Bar-Sinai
Google LLC, Tel Aviv University