A Picture of the Prediction Space of Deep Networks

Jialin Mao; Itay Griniasty; Rubing Yang; Han Kheng Teoh; Rahul Ramesh; Mark K Transtrum; James P Sethna; Pratik Chaudhari

A Picture of the Prediction Space of Deep Networks

ORAL · Invited

Abstract

There are two stark paradoxes in deep learning today. First, deep networks have many more parameters than the number of training data and they can therefore overfit. And yet, these networks predict remarkably accurately---defying accepted statistical wisdom. Second, training deep networks is a high-dimensional, large-scale and non-convex optimization problem and should be prohibitively hard. And yet, training is tractable---even easy. This talk seeks to shed light upon these paradoxes. It will use techniques from information geometry to study the prediction space of the deep networks.

I will argue that deep networks generalize well because of a characteristic structure in the space of learning tasks. The input correlation matrix for typical tasks has a “sloppy” eigenspectrum where, in addition to a few large eigenvalues, there is a large number of small eigenvalues that are distributed uniformly over a very large range. As a consequence, quantities such as the Hessian or the Fisher Information Matrix also have a sloppy eigenspectrum. Using these ideas, I will demonstrate an analytical non-vacuous generalization bound for deep networks.

I will argue that training a deep network is computationally tractable because for sloppy tasks, the training process explores an extremely low-dimensional (~0.001% of the dimensionality of the embedding space) manifold in the prediction space. Models with different neural architectures (fully-connected, convolutional, residual, and attention-based), training methods (stochastic gradient descent and variants), weight initializations (random vs. pre-training on random labels), and regularization techniques (weight-decay, batch-normalization, and data-augmentation) evolve along very similar trajectories in the prediction space when trained for the same task and traverse a very similar manifold.

March 6, 2023, 3:42 PM – March 6, 2023, 4:18 PM

Publication: 1. Yang, R., Mao, J. & Chaudhari, P. Does the Data Induce Capacity Control in Deep Learning? Proc. of the International Conference of Machine Learning (2022). arXiv: https://arxiv.org/abs/2110.14163<br>2. Mao, J., Griniasty, I., Yang, R., Teoh, H. K., Ramesh, R., Transtrum, M., Sethna, J. & Chaudhari, P. A Picture of<br>the Prediction Space of Deep Neural Networks (in preparation).<br>3. Ramesh, R., Mao, J., Griniasty, I., Yang, R., Teoh, H. K., Transtrum, M., Sethna, J. & Chaudhari, P. A Picture of<br>the Space of Learning Tasks (in preparation).

Presenters

Pratik Chaudhari

University of Pennsylvania

Authors

Jialin Mao

University of Pennsylvania
Itay Griniasty

Cornell University
Rubing Yang

University of Pennsylvania
Han Kheng Teoh

Cornell University
Rahul Ramesh

University of Pennsylvania
Mark K Transtrum

Brigham Young University
James P Sethna

Cornell University
Pratik Chaudhari

University of Pennsylvania