APS Logo

Model-free quantification of completeness, uncertainties, and outliers in atomistic machine learning using information theory

ORAL

Abstract

Quantifying information contents is needed for several problems in atomistic machine learning (ML), from training set curation, uncertainty quantification (UQ), or obtaining insights from large datasets or trajectories. However, atomistic ML often requires unsupervised learning or model predictions to quantify information in simulation or training data. Here, we introduce a theoretical strategy leading to a model-free approach to quantifying information contents in atomistic datasets. We show that the information entropy of atom-centered representations explains common heuristics in atomistic ML, from learning curves to generalization errors. Our method also introduces a UQ strategy to quantify epistemic uncertainty and detect out-of-distribution samples without the need for a model. These results have been used to explain error trends in datasets for ML potentials, detect rare events in simulations, and benchmark the reliability of interatomic potentials. This work provides a new tool for data-driven atomistic simulation with synergistic efforts in ML, simulations, and theory.

Publication: https://doi.org/10.48550/arXiv.2404.12367

Presenters

  • Daniel Schwalbe-Koda

    UCLA

Authors

  • Daniel Schwalbe-Koda

    UCLA

  • Sebastien Hamel

    Lawrence Livermore National Laboratory

  • Babak Sadigh

    Lawrence Livermore National Laboratory

  • Fei Zhou

    LLNL, Lawrence Livermore National Laboratory

  • Vincenzo Lordi

    Lawrence Livermore National Laboratory