APS Logo

A massive dataset of synthesis-friendly hypothetical polymers

ORAL

Abstract

Polymer informatics is an emerging field in materials science. It aims to build data-driven models to instantaneously predict the properties of polymers, and use this capability to screen a large candidate set of polymers to identify promising ones based on their predicted properties. However, it is important for this candidate set to include synthesizable polymers. By utilizing ~13k experimentally known polymers, we identified two distinct pathways to generate a dataset of synthesis-friendly hypothetical polymers. These pathways comprise a combinatorial assembly of retrosynthetic fragments obtained from the ~13k polymers, and a framework that treats polymers are graphs followed by graph-to-graph translations. This has resulted in a massive dataset of 100 million hypothetical but synthesis-friendly polymers. Additionally, we quantify the synthetic feasibility of each polymer as a score and demonstrate that a large portion of the generated polymers are synthesis-ready. This massive database can be used (1) for direct screening purposes using available property prediction models, and (2) within unsupervised approaches to train of generative models to enable and accelerate polymer discovery.

Presenters

  • Arunkumar Rajan

    Georgia Institute of Technology

Authors

  • Arunkumar Rajan

    Georgia Institute of Technology

  • Chiho Kim

    Georgia Institute of Technology, School of Materials Science and Engineering, Georgia Institute of Technology

  • Christopher Kuenneth

    Georgia Institute of Technology

  • Deepak Kamal

    Georgia Tech, Georgia Institute of Technology, Georgia Inst of Tech

  • Rishi Gurnani

    Georgia Institute of Technology, Georgia Inst of Tech

  • Rohit Batra

    Georgia Institute of Technology

  • Rampi Ramprasad

    Georgia Inst of Tech, Georgia Tech, Georgia Institute of Technology, School of Materials Science and Engineering, Georgia Institute of Technology