Data augmentation techniques to improve material property prediction performance using Graph Neural Networks

Rishikesh Magar

Data augmentation techniques to improve material property prediction performance using Graph Neural Networks

ORAL

Abstract

In recent years, Graph Neural Network (GNN) based methodologies have been extensively used for material property prediction. Although these GNNs have been successful in predicting material properties with a very high accuracy, they rely on large amounts datasets for training. Often, these large datasets are generated from ab-initio calculations or experimentations which are resource extensive and time consuming, limiting the applicability of GNNs. To overcome the lack of data availability, we introduce five physics informed data augmentations – Perturbation, Rotation, SwapAxes, Translation and SuperCell transformation that can be applied to crystalline systems and increase the amount of data available for GNN training. Using these augmentation techniques, we show improvements in performance for 4 state of the art GNN models - CGCNN, MEGNET, GINE and SchNET on 5 different datasets. We observe a performance gain between 10%-50% on most of the models, proving the effectiveness of data augmentation in training GNNs. We also perform ablation studies to determine the most effective augmentation strategies for a particular material property. Finaly, we develop an open source software package that performs these augmentations under the hood and make it available for public use.

March 17, 2022, 9:24 AM – March 17, 2022, 9:36 AM

Publication: Planned Paper:
Auglichem: Data Augmentation Library of Chemical Structures for Machine Learning

Presenters

Rishikesh Magar

Carnegie Mellon University

Authors

Rishikesh Magar

Carnegie Mellon University