AutoInit: Automatic Initialization via Jacobian Tuning

Tianyu He; Darshil H Doshi; Andrey Gromov

AutoInit: Automatic Initialization via Jacobian Tuning

ORAL

Abstract

Good initialization is essential for training Deep Neural Networks (DNNs). Oftentimes such initialization is found through a trial and error approach, which has to be applied anew every time an architecture is substantially modified, or inherited from smaller size networks leading to sub-optimal initialization. We will introduce a new and cheap algorithm, that allows one to find a good initialization automatically for general architectures. The algorithm utilizes the Jacobian between adjacent network basic blocks to tune the network hyperparameters to criticality. We will show the dynamics of the algorithm for fully connected networks with ReLU and derive conditions for its convergence. Then we will show that our method can find the automatic one-shot initialization for a variety of modern architectures with normalization layers and residual connections, where the initialization found by our method shows good performance on vision tasks.

March 8, 2023, 1:24 PM – March 8, 2023, 1:36 PM

Presenters

Tianyu He

University of Maryland, College Park

Authors

Tianyu He

University of Maryland, College Park
Darshil H Doshi

University of Maryland, College Park
Andrey Gromov

University of Maryland, College Park