The Onset of Variance-Limited Behavior for Neural Networks at Finite Width and Sample Size.
ORAL
Abstract
For small training set sizes, the generalization error of wide neural networks is well-approximated by the error of an infinite width neural network. However, at a training set size, the finite-width network generalization begins to worsen compared to the infinite width performance. We empirically study the transition from the infinite width behavior to this variance-limited regime as a function of training set size and network width and network initialization scale. We find that finite size effects can become relevant for very small dataset sizes going as the square root of the width for polynomial regression with ReLU networks. We discuss the source of this finite size behavior based on the variance of the NN's final neural tangent kernel (NTK). Using this, we provide a toy model which also exhibits the same scaling and has sample-size dependent benefits from feature learning.
–
Publication: Planned paper "The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich Regimes".
Presenters
-
Alexander B Atanasov
Harvard University
Authors
-
Alexander B Atanasov
Harvard University
-
Cengiz Pehlevan
Harvard University
-
Blake Bordelon
Harvard University
-
Sabarish Sainathan
Harvard University