GSNP Dissertation Award: Statistical mechanics of Bayesian inference and learning in neural networks
ORAL · Invited
Abstract
This thesis collects a few of my essays towards understanding representation learning and generalization in neural networks. I focus on the model setting of Bayesian learning and inference, where the problem of deep learning is naturally viewed through the lens of statistical mechanics. First, I consider properties of freshly-initialized deep networks, with all parameters drawn according to Gaussian priors. I provide exact solutions for the marginal prior predictive of networks with isotropic priors and linear or rectified-linear activation functions. I then study the effect of introducing structure to the priors of linear networks from the perspective of random matrix theory. Turning to memorization, I consider how the choice of nonlinear activation function affects the storage capacity of treelike neural networks. Then, we come at last to representation learning. I study the structure of learned representations in Bayesian neural networks at large but finite width, which are amenable to perturbative treatment. I then show how the ability of these networks to generalize when presented with unseen data is affected by representational flexibility, through precise comparison to models with frozen, random representations. In the final portion of this thesis, I bring a geometric perspective to bear on the structure of neural network representations. I first consider how the demand of fast inference shapes optimal representations in recurrent networks. Then, I consider the geometry of representations in deep object classification networks from a Riemannian perspective. In total, this thesis begins to elucidate the structure and function of optimally distributed neural codes in artificial neural networks.
–
Publication: See jzv.io for complete list of papers
Presenters
-
Jacob Zavatone-Veth
Harvard University
Authors
-
Jacob Zavatone-Veth
Harvard University