APS Logo

The Onset of Variance-Limited Behavior for Neural Networks at Finite Width and Sample Size.

ORAL

Abstract

For small training set sizes, the generalization error of wide neural networks is well-approximated by the error of an infinite width neural network. However, at a training set size, the finite-width network generalization begins to worsen compared to the infinite width performance. We empirically study the transition from the infinite width behavior to this variance-limited regime as a function of training set size and network width and network initialization scale. We find that finite size effects can become relevant for very small dataset sizes going as the square root of the width for polynomial regression with ReLU networks. We discuss the source of this finite size behavior based on the variance of the NN's final neural tangent kernel (NTK). Using this, we provide a toy model which also exhibits the same scaling and has sample-size dependent benefits from feature learning.

Publication: Planned paper "The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich Regimes".

Presenters

  • Alexander B Atanasov

    Harvard University

Authors

  • Alexander B Atanasov

    Harvard University

  • Cengiz Pehlevan

    Harvard University

  • Blake Bordelon

    Harvard University

  • Sabarish Sainathan

    Harvard University