Statistical Mechanics of Double Descent in Deep Learning: a Phase Transition Perspective
ORAL
Abstract
Double descent is the term used to describe how model performance degrades as a function of complexity before unexpectedly improving again beyond some critical threshold, as the model becomes highly overparameterized. This unexpected behavior is a signature of deeper patterns beyond traditional bias-variance limits, and is reminiscent of the way in which order emerges from collective dynamics. However, it still lacks a theoretical understanding from the perspective of statistical mechanics and phase transitions. Here we introduce a framework for formulating and understanding double descent in neural networks, using dynamic mean field theory to analyze their asymptotic behavior in the long-time limit, both in the thermodynamic limit and for finite-size systems. We derive the generalization and training errors from the mean-field equations and extract the effective dynamics of the neural network, represented as the behavior of a single effective connection. We apply this framework to the linear regression problem, the simplest model exhibiting double descent, and determine data collapse and critical exponents characterizing its double descent phenomenon, as well as understanding double descent as an emergent phenomenon. This work provides a novel phase transition perspective that explains the double descent behavior in terms of underlying critical dynamics, and suggests how other emergent phenomena in neural networks might be approached.
–
Presenters
-
Chan Li
University of California, San Diego
Authors
-
Chan Li
University of California, San Diego
-
Nigel Goldenfeld
University of California, San Diego