Probing Feature Representations in ML Models for Materials Discovery Through Loss

Ashley Dale; Kangming Li; Brian DeCost; Jason Hattrick-Simpers

Probing Feature Representations in ML Models for Materials Discovery Through Loss

POSTER

Abstract

Machine learning models for materials science are vulnerable to learning arbitrary data representations. We evaluate the physicality of models’ learned representations by characterizing model loss landscapes generated by in-distribution and out-of-distribution prediction tasks. Geometric features of the loss landscape indicate changes in model behavior due to perturbation of model weights. Regions where model performance is undiminished despite perturbation suggest a model feature representation which is robust for a given perturbation scale. This method is applied to graph neural networks (GNN) trained using DFT materials datasets, where we consider easy (nominally in-distribution) and hard (out-of-distribution) prediction of enthalpy or band gap properties. First, we consider enthalpy predictions from models trained by omitting chemistries containing either Fe or O. Models trained on element-specific selections of DFT datasets typically generalize well to omitted chemistries unless oxides are omitted from the training set. We then consider band gap prediction by models trained with omitted chemistries, for which omission of either Fe or O from the training set results in poor generalization to that chemistry. Analysis of loss landscapes associated with these tasks provides insights into the physicality of the model’s learned representations.

Publication: Evaluating the Limits of the Physics Learned by a Machine Learning Model by Dale, Li, DeCost, Hattrick-Simpers<br>Loss Landscape Analysis of Model Accuracy by Dale, Li, DeCost, Hattrick-Simpers<br>Trusted AI Toolkit for Scientists (TRAITS) by Dale, Yao, Hattrick-Simpers

Presenters

Ashley Dale

University of Toronto

Authors

Ashley Dale

University of Toronto
Kangming Li

Acceleration Consortium, University of Toronto
Brian DeCost

National Institute of Standards and Technology
Jason Hattrick-Simpers

University of Toronto