Polymer Property Prediction via Pre-trained Large Language Model

Yuyang Wang; Changwen Xu; Amir Barati Farimani

Polymer Property Prediction via Pre-trained Large Language Model

ORAL

Abstract

Accurate and efficient evaluation of polymer properties is significant in polymer design. Conventional methods rely on expensive and time-consuming experiments or simulations to assess the material functions. The recent development of Transformer-based large language models has demonstrated superior performance on various applications in natural language processing and computer vision. However, such methods have not been well investigated in polymer science. In this work, we present TransPolymer, a Transformer-based language model built on self-attention for polymer property prediction. We propose a polymer tokenization strategy that encodes material informatics and converts each polymer into a text sequence. Also, TransPolymer benefits from pre-training on large unlabeled data via predicting the masked tokens in a self-supervised learning manner. Experiments have demonstrated that TransPolymer surpasses other baseline machine learning models in various polymer property prediction tasks. Moreover, self-supervised pre-training shows merits over training from the randomly initialized Transformer model. We hope this work provides a promising computational tool for polymer design and understanding structure-property relationships from a data science perspective.

March 8, 2023, 4:18 PM – March 8, 2023, 4:30 PM

Presenters

Yuyang Wang

Carnegie Mellon University

Authors

Yuyang Wang

Carnegie Mellon University
Changwen Xu

Carnegie Mellon University
Amir Barati Farimani

Carnegie Mellon University