APS Logo

The Evolution of the Fisher Information Matrix During Deep Neural Network Training

ORAL

Abstract



Recently, deep neural networks (DNNs) have revolutionized nearly every area of machine learning, and their success has challenged our understanding. In particular, DNNs have empirically been shown to generalize well even in the overparameterized regime. Some correlates of generalization have been found, including flatness of the loss function (Jiang et. al. 2019), and these have even been shown to be causally useful in improving generalization (Foret et. al. 2021), but further study is required. Here, we study the evolution of the Fisher Information Matrix throughout training in both the early and late phase, and identify a number of dynamical signatures of its behavior. While the Fisher often coincides with flatness-based measures such as the Hessian late in training, during the early phase of training they will not in general align. In addition, the Fisher does not require labeled data to compute, allowing its computation on held-out test data. Our method is able to compute the exact Fisher and its eigendecomposition on various subsets of data throughout training, as often as every step along the training curve. In particular, we study the evolution of the Fisher across various dataset splits: train/test, per class, and per domain (in the out-of-distribution setting), and correlate these measures with generalization, both in and out of distribution.

Presenters

  • Chase W Goddard

    Princeton University

Authors

  • Chase W Goddard

    Princeton University

  • David J Schwab

    The Graduate Center, CUNY