APS Logo

Bias-imbalance in data-driven materials science: a case study on MODNet

ORAL

Abstract

As the number of novel data-driven approaches to material science continues to grow, it is crucial to perform consistent quality, reliability and applicability assessments of model predictions. In this respect, an important task is the uncertainty assessment of a model towards a target domain. Significant variations in test errors can be observed, depending on the imbalance and bias in the training set (i.e., similarity between training and application space). To illustrate this, the Materials Optimal Descriptor Network (MODNet), a method for small datasets is used as a case study on MatBench v0.1, a curated test suite of materials datasets. By using an ensemble MODNet model, confidence intervals can be built and the uncertainty on individual predictions can be quantified. Imbalance and bias issues are often overlooked, and yet are important for successful real-world applications of machine learning in materials science and condensed matter.

 
 


 


 


 


 


 


 


 

Publication: De Breuck, P.-P., Evans, M. L. & Rignanese, G.-M. Robust model benchmarking and bias-imbalance in data-driven materials science: a case study on MODNet. J. Phys.: Condens. Matter 33, 404002 (2021).

Presenters

  • Pierre-Paul De Breuck

    Universite catholique de Louvain

Authors

  • Pierre-Paul De Breuck

    Universite catholique de Louvain

  • Matthew L Evans

    Universite catholique de Louvain

  • Gian-Marco Rignanese

    Universite catholique de Louvain