APS Logo

Enhanced Machine Learning Models for Structure-Property Mapping with Principal Covariates Regression

ORAL

Abstract

Data analyses based on linear methods constitute the simplest, most robust, and transparent approaches to the automatic processing of large amounts of data for building supervised or unsupervised machine learning models. Principal covariates regression (PCovR) is an underappreciated method that interpolates between principal component analysis and linear regression, and can be used to conveniently reveal structure-property relations in terms of simple-to-interpret, low-dimensional maps. Here we introduce a kernelized version of PCovR and demonstrate the performance of this approach in revealing and predicting structure-property relations in chemistry and materials science. Additionally, we demonstrate the improved performance resulting from incorporating PCovR into two popular data selection methodologies, CUR and Farthest Point Sampling, which iteratively identify the most diverse samples and discriminating features.

Presenters

  • Rose K. Cersonsky

    Ecole Polytechnique Federale de Lausanne

Authors

  • Rose K. Cersonsky

    Ecole Polytechnique Federale de Lausanne

  • Benjamin A. Helfrecht

    Ecole Polytechnique Federale de Lausanne

  • Guillaume Fraux

    Ecole Polytechnique Federale de Lausanne

  • Edgar Engel

    Trinity College, University of Cambridge

  • Michele Ceriotti

    Ecole polytechnique federale de Lausanne, Ecole Polytechnique Federale de Lausanne, Institute of Materials, Swiss Federal Institute of Technology in Lausanne (EPFL), Switzerland, École Polytechnique Federale de Lausanne, Laboratory of Computational Science and Modeling, Institut des Matériaux, École Polytechnique Fédérale de Lausanne