APS Logo

The Role of Data in the Sloppiness of Deep Networks

ORAL

Abstract

We study how the dataset may be the cause of the anomalous generalization performance of deep networks. We show that the data correlation matrix of typical classification datasets has an eigenspectrum where, after a sharp initial drop, a large number of small eigenvalues are distributed uniformly over an exponentially large range. This structure is mirrored in a network trained on this data: we show that the Hessian and the Fisher Information Matrix (FIM) have eigenvalues that are spread uniformly over exponentially large ranges. For such ``sloppy'' eigenspectra, sets of weights corresponding to small eigenvalues can be modified by large magnitudes without affecting the loss. Networks trained on atypical, non-sloppy synthetic data do not share these traits. We show how this structure in the data sheds light on the generalization performance of deep networks using PAC-Bayesian analysis.

Presenters

  • Pratik Chaudhari

    University of Pennsylvania

Authors

  • Pratik Chaudhari

    University of Pennsylvania

  • Rubing Yang

    University of Pennsylvania

  • Jialin Mao

    University of Pennsylvania