APS Logo

Comprehensive Analysis of Machine-Learning Kernels for Predicting Molecular Properties

ORAL

Abstract

Exploration of the vast chemical compound space has been widely assisted by machine learning (ML) approaches, e.g., neural networks and kernel ridge regression (KRR). Yet, a comprehensive understanding of the different components in the development of ML models is still lacking. In this work, we analyze the influence of components of the KRR method (representation, kernel function, distance metric) in the prediction performance of (energetic and electronic) quantum-mechanical molecular properties. To do so, we consider the QM7-X dataset containing 42 physicochemical properties for ~4.2M equilibrium and non-equilibrium primarily organic molecular structures. Two- and three-body geometric representations as well as Gaussian and Laplacian kernels are used to develop the KRR models. To probe the distance metric impact, we use a generalized form of the standard Euclidean and Manhattan distances in KRR ­– the Minkowski metric. This allows for non-integer norms between geometric representations, thus optimizing the impact of outliers in molecular data. We expect our work to provide a deeper understanding of the correlation between KRR components for an optimal prediction of molecular properties of both equilibrium and out-of-equilibrium structures.

Presenters

  • Mirela Puleva

    University of Luxembourg Limpertsberg

Authors

  • Mirela Puleva

    University of Luxembourg Limpertsberg

  • Leonardo Medrano Sandonas

    University of Luxembourg Limpertsberg, University of Luxembourg

  • Artem Kokorin

    University of Luxembourg Limpertsberg

  • Alexandre Tkatchenko

    University of Luxembourg Limpertsberg