APS Logo

Understanding Layer Normalization in Deep Neural Networks

ORAL

Abstract

Deep neural networks (DNNs) have been proven to be very powerful in classification, language modeling, and computer vision problems. However, training DNNs are computationally hard due to the huge number of parameters. A good initialization of a DNN can save enormous amounts of computational power. Layer normalization (LayerNorm) is introduced to make training faster and generalization easier. We show that the effect of LayerNorm can be quantitatively studied in infinite width limit, by plotting phase diagrams of DNNs. We then show empirically that for many cases, with LayerNorm layers, that a DNN initialized close to the phase boundary can have the best performance in both training and validation.

Presenters

  • Tianyu He

    Brown University

Authors

  • Tianyu He

    Brown University

  • Darshil H Doshi

    Brown University

  • Andrey Gromov

    Brown University