Jacobians in Deep Neural Networks : Criticality and beyond

Darshil H Doshi; Tianyu He; Andrey Gromov

Jacobians in Deep Neural Networks : Criticality and beyond

ORAL

Abstract

Good parameter-Initialization is crucial for training Deep Neural Networks. Correct initialization ensures that the network function and gradients are well-behaved with depth. The conditions for such an initialization, known as “criticality”, help us select hyperparameters of the network.

Jacobians between layer-outputs of the network are essential for this purpose. The norm of the Jacobian is useful for identifying critical initialization for networks. On the other hand, the spectrum of the Jacobian matrix contains information about the fluctuations in the gradients. These fluctuations play an important role in very deep networks.

I will begin my talk by formulating criticality in terms of Jacobians-norm. Using this formulation, I will show that it is possible to design networks that are “everywhere-critical”, i.e. critical irrespective of the choice of initialization; by incorporating LayerNorm/BatchNorm and residual connections. Then I will discuss modern architectures that utilize this combination; followed by experimental results that demonstrate the effect of criticality on training. Using Jacobian-spectrum, I will derive additional constraints on hyperparameters in the aforementioned everywhere-critical case.

March 8, 2023, 1:12 PM – March 8, 2023, 1:24 PM

Publication: Doshi, D., He, T. and Gromov, A. "Critical Initialization of Wide and Deep Neural Networks through Partial Jacobians: General Theory and Applications". arXiv:2111.12143v3

Presenters

Darshil H Doshi

University of Maryland, College Park

Authors

Darshil H Doshi

University of Maryland, College Park
Tianyu He

University of Maryland, College Park
Andrey Gromov

University of Maryland, College Park