APS Logo

BigSMILES: A Digitalization Scheme for Data-Driven Macromolecules Research

POSTER

Abstract

In polymer research, a major hurdle preventing the adoption of data-driven approaches to modeling is the lack of a general digitalization scheme for polymeric systems. To address this issue, a digitalization scheme is proposed that consists of two components: first, a structurally based line notation that specifies how different repeating units interconnect to form polymers, and second, a data format that quantitatively specifies the distributional properties associated with the structure presented in the first part. The new line notation, BigSMILES, built on top of the popular line notation SMILES, encodes the chemical structures of polymeric fragments with “stochastic objects” that specify the constituent repeating units and the permissible set of connectivity patterns between them. Along with the accompanying data standard, BigSMILES provides a compact, machine-friendly yet versatile route to digitally encode and report polymeric materials. It is hoped that the proposed scheme can be easily utilized by both material scientist and modelling experts to enable rapid development of data-driven polymers research.

Presenters

  • Tzyy-Shyang Lin

    Massachusetts Institute of Technology, Massachusetts Institute of Technology MIT

Authors

  • Tzyy-Shyang Lin

    Massachusetts Institute of Technology, Massachusetts Institute of Technology MIT

  • Bradley Olsen

    Massachusetts Institute of Technology MIT, Massachusetts Institute of Technology