Probing Feature Representations in ML Models for Materials Discovery Through Loss
POSTER
Abstract
Machine learning models for materials science are vulnerable to learning arbitrary data representations. We evaluate the physicality of models’ learned representations by characterizing model loss landscapes generated by in-distribution and out-of-distribution prediction tasks. Geometric features of the loss landscape indicate changes in model behavior due to perturbation of model weights. Regions where model performance is undiminished despite perturbation suggest a model feature representation which is robust for a given perturbation scale. This method is applied to graph neural networks (GNN) trained using DFT materials datasets, where we consider easy (nominally in-distribution) and hard (out-of-distribution) prediction of enthalpy or band gap properties. First, we consider enthalpy predictions from models trained by omitting chemistries containing either Fe or O. Models trained on element-specific selections of DFT datasets typically generalize well to omitted chemistries unless oxides are omitted from the training set. We then consider band gap prediction by models trained with omitted chemistries, for which omission of either Fe or O from the training set results in poor generalization to that chemistry. Analysis of loss landscapes associated with these tasks provides insights into the physicality of the model’s learned representations.
Publication: Evaluating the Limits of the Physics Learned by a Machine Learning Model by Dale, Li, DeCost, Hattrick-Simpers<br>Loss Landscape Analysis of Model Accuracy by Dale, Li, DeCost, Hattrick-Simpers<br>Trusted AI Toolkit for Scientists (TRAITS) by Dale, Yao, Hattrick-Simpers
Presenters
-
Ashley Dale
University of Toronto
Authors
-
Ashley Dale
University of Toronto
-
Kangming Li
Acceleration Consortium, University of Toronto
-
Brian DeCost
National Institute of Standards and Technology
-
Jason Hattrick-Simpers
University of Toronto