Model-free quantification of completeness, uncertainties, and outliers in atomistic machine learning using information theory
ORAL
Abstract
Quantifying information contents is needed for several problems in atomistic machine learning (ML), from training set curation, uncertainty quantification (UQ), or obtaining insights from large datasets or trajectories. However, atomistic ML often requires unsupervised learning or model predictions to quantify information in simulation or training data. Here, we introduce a theoretical strategy leading to a model-free approach to quantifying information contents in atomistic datasets. We show that the information entropy of atom-centered representations explains common heuristics in atomistic ML, from learning curves to generalization errors. Our method also introduces a UQ strategy to quantify epistemic uncertainty and detect out-of-distribution samples without the need for a model. These results have been used to explain error trends in datasets for ML potentials, detect rare events in simulations, and benchmark the reliability of interatomic potentials. This work provides a new tool for data-driven atomistic simulation with synergistic efforts in ML, simulations, and theory.
–
Publication: https://doi.org/10.48550/arXiv.2404.12367
Presenters
-
Daniel Schwalbe-Koda
UCLA
Authors
-
Daniel Schwalbe-Koda
UCLA
-
Sebastien Hamel
Lawrence Livermore National Laboratory
-
Babak Sadigh
Lawrence Livermore National Laboratory
-
Fei Zhou
LLNL, Lawrence Livermore National Laboratory
-
Vincenzo Lordi
Lawrence Livermore National Laboratory