APS Logo

Polymer Property Prediction via Pre-trained Large Language Model

ORAL

Abstract

Accurate and efficient evaluation of polymer properties is significant in polymer design. Conventional methods rely on expensive and time-consuming experiments or simulations to assess the material functions. The recent development of Transformer-based large language models has demonstrated superior performance on various applications in natural language processing and computer vision. However, such methods have not been well investigated in polymer science. In this work, we present TransPolymer, a Transformer-based language model built on self-attention for polymer property prediction. We propose a polymer tokenization strategy that encodes material informatics and converts each polymer into a text sequence. Also, TransPolymer benefits from pre-training on large unlabeled data via predicting the masked tokens in a self-supervised learning manner. Experiments have demonstrated that TransPolymer surpasses other baseline machine learning models in various polymer property prediction tasks. Moreover, self-supervised pre-training shows merits over training from the randomly initialized Transformer model. We hope this work provides a promising computational tool for polymer design and understanding structure-property relationships from a data science perspective.

Presenters

  • Yuyang Wang

    Carnegie Mellon University

Authors

  • Yuyang Wang

    Carnegie Mellon University

  • Changwen Xu

    Carnegie Mellon University

  • Amir Barati Farimani

    Carnegie Mellon University