Toward Statistical Mechanics of Deep Learning
ORAL · Invited
Abstract
The groundbreaking success of deep learning in many real-world tasks has triggered an intense effort to theoretically understand the power and limitations of deep learning in the training and generalization of complex tasks. I will present progress in the theory of Deep Learning, based on the statistical mechanics of weight space in the appropriate thermodynamic limit. I will first discuss Deep Linear Neural Networks (DLNNs). Despite the linearity of the units, learning in DLNNs is highly nonlinear, hence studying its properties reveals some of the essential features of nonlinear Deep Neural Networks.
To derive properties of network weight space after learning we introduce the Back-Propagating Kernel Renormalization (BPKR), which allows for the incremental integration of the network weights layer-by-layer starting from the network output layer and progressing backward until the first layer's weights are integrated out. This procedure allows us to evaluate important network properties, such as its generalization error, the role of network width and depth, the impact of the size of the training set, the effects of weight regularization and learning stochasticity, as well as the emergent neural representations in each layer.
Unlike most statistical mechanical investigations of learning in neural network, the new theory does not make specific assumption about the statistics of the inputs or the desired target; thus it can be applied to realistic data and tasks.
A heuristic extension of the BPKR to nonlinear DNNs with rectified linear units (ReLU) yields surprisingly good fit to numerical simulations for networks with modest depth, in a wide regime of parameters. Extensions, including deep convolutional networks, and interesting families of nonlinear DNNs will be discussed.
To derive properties of network weight space after learning we introduce the Back-Propagating Kernel Renormalization (BPKR), which allows for the incremental integration of the network weights layer-by-layer starting from the network output layer and progressing backward until the first layer's weights are integrated out. This procedure allows us to evaluate important network properties, such as its generalization error, the role of network width and depth, the impact of the size of the training set, the effects of weight regularization and learning stochasticity, as well as the emergent neural representations in each layer.
Unlike most statistical mechanical investigations of learning in neural network, the new theory does not make specific assumption about the statistics of the inputs or the desired target; thus it can be applied to realistic data and tasks.
A heuristic extension of the BPKR to nonlinear DNNs with rectified linear units (ReLU) yields surprisingly good fit to numerical simulations for networks with modest depth, in a wide regime of parameters. Extensions, including deep convolutional networks, and interesting families of nonlinear DNNs will be discussed.
–
Publication: Qianyi Li, and Haim Sompolinsky (2021). Statistical Mechanics of Deep Linear Neural Networks: The Back-Propagating Kernel Renormalization. (Physical Review X 11.3 031059) .
Presenters
-
Haim I Sompolinsky
The Hebrew University of Jerusalem and Harvard University, Hebrew University of Jerusalem, Center for Brain Science, Harvard Univer
Authors
-
Haim I Sompolinsky
The Hebrew University of Jerusalem and Harvard University, Hebrew University of Jerusalem, Center for Brain Science, Harvard Univer
-
Qianyi Li
Biophysics Program, Harvard University