APS Logo

Benchmarking Machine Learning Models for Polymer Informatics: An Example of Glass Transition Temperature

ORAL

Abstract

Various machine learning (ML) models are demonstrated to perform well for polymer's glass transition temperature (Tg) prediction. Nevertheless, they are trained on different datasets, using different structure representations, and based on different feature engineering methods. To provide a fair comparison of different ML techniques and examine the key factors that affect the model performance, we carry out a systematic benchmark study by compiling 79 different ML models and training them on a large and diverse dataset. The three major components in setting up an ML model are structure representations, feature representations, and ML algorithms. In terms of polymer structure representation, we consider the polymer monomer, repeat unit, and oligomer with longer chain structure. Based on that feature representation is calculated, including Morgan fingerprinting with or without substructure frequency, RDKit descriptors, molecular embedding, molecular graph, etc. Afterward, the obtained feature input is trained using different ML algorithms, such as deep neural networks, convolutional neural networks, random forest, support vector machine, LASSO regression, and Gaussian process regression. We evaluate the performance of these ML models using a holdout test set and an extra unlabeled dataset from high-throughput molecular dynamics simulation. The ML model's generalization ability on an unlabeled dataset is especially focused, and the model's sensitivity to topology and the molecular weight of polymers is also taken into consideration. This benchmark study provides not only a guideline for the Tg prediction task, but also a useful reference for other polymer informatics tasks.

Publication: Lei Tao, Vikas Varshney, Ying Li, Benchmarking Machine Learning Models for Polymer Informatics: An Example of Glass Transition Temperature, Journal of Chemical Information and Modeling, 2021, In Press

Presenters

  • Ying Li

    University of Connecticut

Authors

  • Ying Li

    University of Connecticut

  • Lei Tao

    University of Connecticut

  • Vikas Varshney

    Air Force Research Laboratory