APS Logo

A Picture of the Prediction Space of Deep Networks

ORAL · Invited

Abstract

There are two stark paradoxes in deep learning today. First, deep networks have many more parameters than the number of training data and they can therefore overfit. And yet, these networks predict remarkably accurately---defying accepted statistical wisdom. Second, training deep networks is a high-dimensional, large-scale and non-convex optimization problem and should be prohibitively hard. And yet, training is tractable---even easy. This talk seeks to shed light upon these paradoxes. It will use techniques from information geometry to study the prediction space of the deep networks.

I will argue that deep networks generalize well because of a characteristic structure in the space of learning tasks. The input correlation matrix for typical tasks has a “sloppy” eigenspectrum where, in addition to a few large eigenvalues, there is a large number of small eigenvalues that are distributed uniformly over a very large range. As a consequence, quantities such as the Hessian or the Fisher Information Matrix also have a sloppy eigenspectrum. Using these ideas, I will demonstrate an analytical non-vacuous generalization bound for deep networks.

I will argue that training a deep network is computationally tractable because for sloppy tasks, the training process explores an extremely low-dimensional (~0.001% of the dimensionality of the embedding space) manifold in the prediction space. Models with different neural architectures (fully-connected, convolutional, residual, and attention-based), training methods (stochastic gradient descent and variants), weight initializations (random vs. pre-training on random labels), and regularization techniques (weight-decay, batch-normalization, and data-augmentation) evolve along very similar trajectories in the prediction space when trained for the same task and traverse a very similar manifold.

Publication: 1. Yang, R., Mao, J. & Chaudhari, P. Does the Data Induce Capacity Control in Deep Learning? Proc. of the International Conference of Machine Learning (2022). arXiv: https://arxiv.org/abs/2110.14163<br>2. Mao, J., Griniasty, I., Yang, R., Teoh, H. K., Ramesh, R., Transtrum, M., Sethna, J. & Chaudhari, P. A Picture of<br>the Prediction Space of Deep Neural Networks (in preparation).<br>3. Ramesh, R., Mao, J., Griniasty, I., Yang, R., Teoh, H. K., Transtrum, M., Sethna, J. & Chaudhari, P. A Picture of<br>the Space of Learning Tasks (in preparation).

Presenters

  • Pratik Chaudhari

    University of Pennsylvania

Authors

  • Jialin Mao

    University of Pennsylvania

  • Itay Griniasty

    Cornell University

  • Rubing Yang

    University of Pennsylvania

  • Han Kheng Teoh

    Cornell University

  • Rahul Ramesh

    University of Pennsylvania

  • Mark K Transtrum

    Brigham Young University

  • James P Sethna

    Cornell University

  • Pratik Chaudhari

    University of Pennsylvania