Understanding Layer Normalization in Deep Neural Networks

Tianyu He; Darshil H Doshi; Andrey Gromov

Understanding Layer Normalization in Deep Neural Networks

ORAL

Abstract

Deep neural networks (DNNs) have been proven to be very powerful in classification, language modeling, and computer vision problems. However, training DNNs are computationally hard due to the huge number of parameters. A good initialization of a DNN can save enormous amounts of computational power. Layer normalization (LayerNorm) is introduced to make training faster and generalization easier. We show that the effect of LayerNorm can be quantitatively studied in infinite width limit, by plotting phase diagrams of DNNs. We then show empirically that for many cases, with LayerNorm layers, that a DNN initialized close to the phase boundary can have the best performance in both training and validation.

March 15, 2022, 6:24 PM – March 15, 2022, 6:36 PM

Presenters

Tianyu He

Brown University

Authors

Tianyu He

Brown University
Darshil H Doshi

Brown University
Andrey Gromov

Brown University