The Role of Data in the Sloppiness of Deep Networks
ORAL
Abstract
We study how the dataset may be the cause of the anomalous generalization performance of deep networks. We show that the data correlation matrix of typical classification datasets has an eigenspectrum where, after a sharp initial drop, a large number of small eigenvalues are distributed uniformly over an exponentially large range. This structure is mirrored in a network trained on this data: we show that the Hessian and the Fisher Information Matrix (FIM) have eigenvalues that are spread uniformly over exponentially large ranges. For such ``sloppy'' eigenspectra, sets of weights corresponding to small eigenvalues can be modified by large magnitudes without affecting the loss. Networks trained on atypical, non-sloppy synthetic data do not share these traits. We show how this structure in the data sheds light on the generalization performance of deep networks using PAC-Bayesian analysis.
–
Presenters
-
Pratik Chaudhari
University of Pennsylvania
Authors
-
Pratik Chaudhari
University of Pennsylvania
-
Rubing Yang
University of Pennsylvania
-
Jialin Mao
University of Pennsylvania