APS Logo

Prediction of atomization energies using entropic data representation and machine learning

ORAL

Abstract

Calculations of the atomization energies of molecules can be computationally expensive, but machine learning techniques have been used as an effective and accurate method for predicting these energies using the positions and charges of atoms within the molecule as features. This information is encoded in the Coulomb matrix, but the disparity in the number of atoms and lack of a well-defined ordering system means that it is necessary to use another method of data representation to apply machine learning methods effectively. Previous methods include an eigenspectrum representation, sorting the Coulomb matrices, or using randomly sorted Coulomb matrices (Hansen et al., 2013). We introduce a new method of data representation using a novel information entropy metric that is unaffected by the size or order of the Coulomb matrix. We tested this approach with the QM7 dataset which includes structural information and atomization energies of 7165 molecules. A raw application of our representation produces a correlation between the atomization energy and the graph information entropy of up to 0.97, and predictions close to state-of-the-art are achieved based on other statistical metrics when combined with well-established learning algorithms such as neural networks and k-nearest neighbors.

Presenters

  • Michael De La Rosa

    University of Texas at El Paso

Authors

  • Michael De La Rosa

    University of Texas at El Paso

  • Jorge Munoz

    University of Texas at El Paso, Physics, University of Texas at El Paso