Model-free quantification of completeness, uncertainties, and outliers in atomistic machine learning using information theory

Daniel Schwalbe-Koda; Sebastien Hamel; Babak Sadigh; Fei Zhou; Vincenzo Lordi

Model-free quantification of completeness, uncertainties, and outliers in atomistic machine learning using information theory

ORAL

Abstract

Quantifying information contents is needed for several problems in atomistic machine learning (ML), from training set curation, uncertainty quantification (UQ), or obtaining insights from large datasets or trajectories. However, atomistic ML often requires unsupervised learning or model predictions to quantify information in simulation or training data. Here, we introduce a theoretical strategy leading to a model-free approach to quantifying information contents in atomistic datasets. We show that the information entropy of atom-centered representations explains common heuristics in atomistic ML, from learning curves to generalization errors. Our method also introduces a UQ strategy to quantify epistemic uncertainty and detect out-of-distribution samples without the need for a model. These results have been used to explain error trends in datasets for ML potentials, detect rare events in simulations, and benchmark the reliability of interatomic potentials. This work provides a new tool for data-driven atomistic simulation with synergistic efforts in ML, simulations, and theory.

March 18, 2025, 11:24 AM – March 18, 2025, 11:36 AM

Publication: https://doi.org/10.48550/arXiv.2404.12367

Presenters

Daniel Schwalbe-Koda

UCLA

Authors

Daniel Schwalbe-Koda

UCLA
Sebastien Hamel

Lawrence Livermore National Laboratory
Babak Sadigh

Lawrence Livermore National Laboratory
Fei Zhou

LLNL, Lawrence Livermore National Laboratory
Vincenzo Lordi

Lawrence Livermore National Laboratory