A Picture of the Prediction Space of Deep Networks
ORAL · Invited
Abstract
I will argue that deep networks generalize well because of a characteristic structure in the space of learning tasks. The input correlation matrix for typical tasks has a “sloppy” eigenspectrum where, in addition to a few large eigenvalues, there is a large number of small eigenvalues that are distributed uniformly over a very large range. As a consequence, quantities such as the Hessian or the Fisher Information Matrix also have a sloppy eigenspectrum. Using these ideas, I will demonstrate an analytical non-vacuous generalization bound for deep networks.
I will argue that training a deep network is computationally tractable because for sloppy tasks, the training process explores an extremely low-dimensional (~0.001% of the dimensionality of the embedding space) manifold in the prediction space. Models with different neural architectures (fully-connected, convolutional, residual, and attention-based), training methods (stochastic gradient descent and variants), weight initializations (random vs. pre-training on random labels), and regularization techniques (weight-decay, batch-normalization, and data-augmentation) evolve along very similar trajectories in the prediction space when trained for the same task and traverse a very similar manifold.
–
Publication: 1. Yang, R., Mao, J. & Chaudhari, P. Does the Data Induce Capacity Control in Deep Learning? Proc. of the International Conference of Machine Learning (2022). arXiv: https://arxiv.org/abs/2110.14163<br>2. Mao, J., Griniasty, I., Yang, R., Teoh, H. K., Ramesh, R., Transtrum, M., Sethna, J. & Chaudhari, P. A Picture of<br>the Prediction Space of Deep Neural Networks (in preparation).<br>3. Ramesh, R., Mao, J., Griniasty, I., Yang, R., Teoh, H. K., Transtrum, M., Sethna, J. & Chaudhari, P. A Picture of<br>the Space of Learning Tasks (in preparation).
Presenters
-
Pratik Chaudhari
University of Pennsylvania
Authors
-
Jialin Mao
University of Pennsylvania
-
Itay Griniasty
Cornell University
-
Rubing Yang
University of Pennsylvania
-
Han Kheng Teoh
Cornell University
-
Rahul Ramesh
University of Pennsylvania
-
Mark K Transtrum
Brigham Young University
-
James P Sethna
Cornell University
-
Pratik Chaudhari
University of Pennsylvania