APS Logo

Improving Molecular Force Fields Across Configurational Space by Combining Supervised and Unsupervised Machine Learning

ORAL

Abstract

The training set is as key as the choice of Machine Learning (ML) model itself for the range of applicability and accuracy of the ML model. However, most atomistic reference datasets inherit the inhomogeneous distribution across configurational space (CS) from the MD trajectories. Thus, choosing the training set randomly or according to the probability distribution of the data leads to biased models, whose prediction errors on specific regions of CS can easily exceed the mean value by a factor of three.

To bypass this issue, we combine unsupervised and supervised ML methods: (I) we cluster CS into subregions similar in terms of geometry and energetics, (II) we iteratively test a MLFF model on each subregion and expand the training set to flatten the prediction accuracy across CS. Applying the developed approach to train sGDML [Nat. Commun. 9, 3887 (2018)], GAP [Int. J. Quantum Chem. 115, 1051 (2015)], and SchNet [J. Chem. Phys. 148, 241722 (2018)] ML force fields for small organic molecules and alanine tetrapeptide, we achieve a two-fold reduction in prediction errors without increasing the training set sizes. Furthermore, the new models show enhanced reliability in practical applications.

Presenters

  • Grgory Cordeiro Fonseca

    University of Luxembourg Limpertsberg

Authors

  • Grgory Cordeiro Fonseca

    University of Luxembourg Limpertsberg

  • Igor Poltavskyi

    University of Luxembourg Limpertsberg

  • Valentin Vassilev Galindo

    University of Luxembourg Limpertsberg, Univ Luxembourg

  • Alexandre Tkatchenko

    University of Luxembourg Limpertsberg, University of Luxembourg, Department of Physics and Materials Science, University of Luxembourg, Univ Luxembourg