Comprehensive Analysis of Machine-Learning Kernels for Predicting Molecular Properties
ORAL
Abstract
Exploration of the vast chemical compound space has been widely assisted by machine learning (ML) approaches, e.g., neural networks and kernel ridge regression (KRR). Yet, a comprehensive understanding of the different components in the development of ML models is still lacking. In this work, we analyze the influence of components of the KRR method (representation, kernel function, distance metric) in the prediction performance of (energetic and electronic) quantum-mechanical molecular properties. To do so, we consider the QM7-X dataset containing 42 physicochemical properties for ~4.2M equilibrium and non-equilibrium primarily organic molecular structures. Two- and three-body geometric representations as well as Gaussian and Laplacian kernels are used to develop the KRR models. To probe the distance metric impact, we use a generalized form of the standard Euclidean and Manhattan distances in KRR – the Minkowski metric. This allows for non-integer norms between geometric representations, thus optimizing the impact of outliers in molecular data. We expect our work to provide a deeper understanding of the correlation between KRR components for an optimal prediction of molecular properties of both equilibrium and out-of-equilibrium structures.
–
Presenters
-
Mirela Puleva
University of Luxembourg Limpertsberg
Authors
-
Mirela Puleva
University of Luxembourg Limpertsberg
-
Leonardo Medrano Sandonas
University of Luxembourg Limpertsberg, University of Luxembourg
-
Artem Kokorin
University of Luxembourg Limpertsberg
-
Alexandre Tkatchenko
University of Luxembourg Limpertsberg