A Transfer Learning Framework for Improving Property Prediction, Interpretability, and Chemical Discovery from Scarce Datasets
Invited
Abstract
Machine learning (ML) is being applied in virtually all areas of the chemical sciences to advance or complement activities that have traditionally been performed with physics-based methodologies. To sustain this progress and fulfill mounting expectations, ML must grapple with the intrinsic data scarcity of many applications. While datasets for image classification, object recognition, and some molecular properties may contain millions of samples, more typical chemical applications have access to only a few hundred to a few thousand samples. In data scarce scenarios, ML models can be severely underdetermined, exhibit limited transferability, and ultimately poor predictive power. Transfer learning addresses these data limitations with methodologies that utilize data across different domains, or data with mixed provenance and sparsity to augment and robustly train data scarce models. In this talk, I will discuss a flexible transfer learning approach to address data scarcity by using chemical latent space enrichment, whereby disparate data sources are combined in joint prediction tasks. I’ll show how this approach achieves three improvements over typical supervised learning approaches, including (i) increased property prediction accuracy from scarce data sets, (ii) increased model interpretability, and (iii) increased generative potential for use in the optimization and discovery of new chemistries. The talk will conclude with an outlook of current limitations, ongoing areas of improvement, and new applications.
–
Presenters
-
Brett Savoie
Purdue Univ
Authors
-
Brett Savoie
Purdue Univ