Understanding Layer Normalization in Deep Neural Networks
ORAL
Abstract
Deep neural networks (DNNs) have been proven to be very powerful in classification, language modeling, and computer vision problems. However, training DNNs are computationally hard due to the huge number of parameters. A good initialization of a DNN can save enormous amounts of computational power. Layer normalization (LayerNorm) is introduced to make training faster and generalization easier. We show that the effect of LayerNorm can be quantitatively studied in infinite width limit, by plotting phase diagrams of DNNs. We then show empirically that for many cases, with LayerNorm layers, that a DNN initialized close to the phase boundary can have the best performance in both training and validation.
–
Presenters
-
Tianyu He
Brown University
Authors
-
Tianyu He
Brown University
-
Darshil H Doshi
Brown University
-
Andrey Gromov
Brown University