Scaling Laws in Deep Neural Networks: Insights from Statistical Mechanics and Exactly Solvable Models
ORAL · Invited
Abstract
Artificial deep neural networks are complex, nonlinear statistical models whose learning and function often depends strongly on the model structure and choice of data and algorithm. Empircally, it has been observed that the generalization ability of such networks in learning tasks is frequently governed by power-law trends with respect to simple scaling variables, such as the amount of data available to learn from and the number of learnable parameters. A full understanding -- and in particular, a prescriptive theoretical framework -- for what governs this scaling is lacking. Towards this end, I will discuss our work introducing a classification of different regimes of behavior -- notions of "resolution-limited" and "variance-limited" regimes -- based on the mechanistic origins behind the scaling. Along the way, I will review and then leverage insights from recently discovered exactly solvable models for deep neural networks, a setting in which we can derive the different regimes exactly. I'll close by discussing implications and remaining challenges.
–
Publication: None
Presenters
-
Yasaman Bahri
Google LLC
Authors
-
Yasaman Bahri
Google LLC