APS Logo

Information theory of high dimensional linear regression

ORAL

Abstract

Quantitative characterization of generalization is key to understanding learning in virtually all settings from classical statistical modeling to modern machine learning. While statistical learning in the abundance of data is well-understood, relatively little is known about generalization in the over-parametrized regime where model parameters can far outnumber available data points. Here we demonstrate that recent advances in information-theoretic analyses of generalization provide a general framework for characterizing practical learning algorithms in both data-abundant and data-limited regimes. We consider randomized ridge regression in the thermodynamic limit where we send the numbers of model parameters and data points to infinity while fixing the ratio. We quantify generalization errors, using information-theoretic measures, and analyze an information-theoretic analog of bias-variance decomposition, varying regularization strength, data structure and the degree of over-parametrization. Our results offer a fresh insight into the phenomenon of benign overfitting which describes the surprisingly good generalization properties of perfectly fitted models. Finally we show how the information bottleneck method can be used to identify data-dependent optimal hyperparameters of learning algorithms in the spirit of meta learning.

Presenters

  • Vudtiwat Ngampruetikorn

    The Graduate Center, CUNY, The Graduate Center, City University of New York

Authors

  • Vudtiwat Ngampruetikorn

    The Graduate Center, CUNY, The Graduate Center, City University of New York

  • David J Schwab

    The Graduate Center, CUNY