Prediction of atomization energies using entropic data representation and machine learning
ORAL
Abstract
Calculations of the atomization energies of molecules can be computationally expensive, but machine learning techniques have been used as an effective and accurate method for predicting these energies using the positions and charges of atoms within the molecule as features. This information is encoded in the Coulomb matrix, but the disparity in the number of atoms and lack of a well-defined ordering system means that it is necessary to use another method of data representation to apply machine learning methods effectively. Previous methods include an eigenspectrum representation, sorting the Coulomb matrices, or using randomly sorted Coulomb matrices (Hansen et al., 2013). We introduce a new method of data representation using a novel information entropy metric that is unaffected by the size or order of the Coulomb matrix. We tested this approach with the QM7 dataset which includes structural information and atomization energies of 7165 molecules. A raw application of our representation produces a correlation between the atomization energy and the graph information entropy of up to 0.97, and predictions close to state-of-the-art are achieved based on other statistical metrics when combined with well-established learning algorithms such as neural networks and k-nearest neighbors.
–
Presenters
-
Michael De La Rosa
University of Texas at El Paso
Authors
-
Michael De La Rosa
University of Texas at El Paso
-
Jorge Munoz
University of Texas at El Paso, Physics, University of Texas at El Paso