Efficient, lossless compression of atomistic datasets with information theory
ORAL
Abstract
Machine learning potentials have been increasingly used to predict potential energy surfaces of atomistic systems with high accuracy and efficiency. However, while increasing dataset sizes often lead to improved performance, training models to larger datasets requires expensive computational time. Ideally, we want to find algorithms to compress datasets to reduce training times while avoiding compromising their accuracy. Here, we will describe an algorithm used to compress the dataset based on information theory. First, we will describe the theoretical foundation behind the algorithm and how it more effectively compresses the dataset compared to other widely-used methods. Then, by testing the model's performance on datasets outside the distribution of the training data, we show that our approach systematically leads to richer datasets and models with higher generalization power. The work is distributed as part of the QUESTS package, allowing efficient compression of atomistic datasets.
–
Presenters
-
Benjamin YU
University of California, Los Angeles
Authors
-
Benjamin YU
University of California, Los Angeles
-
Daniel Schwalbe-Koda
UCLA