Physics-Inspired Model Compression of Neural Networks
ORAL
Abstract
Model compression is a subfield of machine learning concerned with methods by which a model can be reduced in size while minimizing negative effects on its performance. We introduce a new method, inspired by statistical physics, for the compression of a neural network: in particular, we treat the parameters of a network during training as a system of particles subjected both to gradients of the objective function and to pairwise attractive interactions, and we show how this treatment causes the parameter distribution of the trained network to concentrate around a discrete set of values. We draw explicit connections between this method and quantization, a popular form of model compression in which a model is compressed by reducing the number of bits required to represent each parameter in memory. We demonstrate that this method produces high-performance, memory-efficient networks across a range of models and tasks. We analyze the parameter distributions which result from the application of our method, and comment on surprising structural features of these distributions. We suggest our method is a powerful tool for unraveling the complexity of overparameterized networks.
–
Presenters
-
Daniel T Bernstein
Princeton University
Authors
-
Daniel T Bernstein
Princeton University
-
David J Schwab
CUNY Graduate Center, The Graduate Center, CUNY, CUNY