Generalized aliasing: a new paradigm for learning and inference
ORAL
Abstract
A central problem in science is to use samples of an unknown function to build a model to predict function values for unseen inputs. Classically, we usually think about model complexity as a trade-off between models that are too simple or inflexible (high bias) and models that are too complex (high variance). At odds with this intuition, modern machine learning has examples of over-parameterized models (with many, many more parameters than data points). These models often exhibit the counterintuitive behavior where models of increasing complexity exhibit decreasing prediction error, a phenomenon recently coined “double descent”.
We introduce a new way of reasoning about models, the generalized aliasing decomposition, that supersedes the bias-variance trade-off paradigm. This new framework has a simple mathematical formulation and shows that prediction error can be explained, predicted, and controlled by understanding three aspects of modeling: generalized aliasing, data insufficiency, and modeling insufficiency. This decomposition not only explains bias-variance trade-off in classical models, but also the counterintuitive “double descent” behavior of over-parameterized models. And because it reveals structure inherent in both the model class and sample points, it can also inform decisions about experimental design and model selection before data labels are collected.
We introduce a new way of reasoning about models, the generalized aliasing decomposition, that supersedes the bias-variance trade-off paradigm. This new framework has a simple mathematical formulation and shows that prediction error can be explained, predicted, and controlled by understanding three aspects of modeling: generalized aliasing, data insufficiency, and modeling insufficiency. This decomposition not only explains bias-variance trade-off in classical models, but also the counterintuitive “double descent” behavior of over-parameterized models. And because it reveals structure inherent in both the model class and sample points, it can also inform decisions about experimental design and model selection before data labels are collected.
–
Publication: https://arxiv.org/pdf/2408.08294
Presenters
-
Gus L.W. Hart
Brigham Young University
Authors
-
Gus L.W. Hart
Brigham Young University
-
Mark K Transtrum
Brigham Young University
-
Tyler Jarvis
Brigham Young University
-
Jared P Whitehead
Brigham Young University