Bias-imbalance in data-driven materials science: a case study on MODNet

Pierre-Paul De Breuck; Matthew L Evans; Gian-Marco Rignanese

Bias-imbalance in data-driven materials science: a case study on MODNet

ORAL

Abstract

As the number of novel data-driven approaches to material science continues to grow, it is crucial to perform consistent quality, reliability and applicability assessments of model predictions. In this respect, an important task is the uncertainty assessment of a model towards a target domain. Significant variations in test errors can be observed, depending on the imbalance and bias in the training set (i.e., similarity between training and application space). To illustrate this, the Materials Optimal Descriptor Network (MODNet), a method for small datasets is used as a case study on MatBench v0.1, a curated test suite of materials datasets. By using an ensemble MODNet model, confidence intervals can be built and the uncertainty on individual predictions can be quantified. Imbalance and bias issues are often overlooked, and yet are important for successful real-world applications of machine learning in materials science and condensed matter.

March 17, 2022, 1:30 PM – March 17, 2022, 1:42 PM

Publication: De Breuck, P.-P., Evans, M. L. & Rignanese, G.-M. Robust model benchmarking and bias-imbalance in data-driven materials science: a case study on MODNet. J. Phys.: Condens. Matter 33, 404002 (2021).

Presenters

Pierre-Paul De Breuck

Universite catholique de Louvain

Authors

Pierre-Paul De Breuck

Universite catholique de Louvain
Matthew L Evans

Universite catholique de Louvain
Gian-Marco Rignanese

Universite catholique de Louvain