APS Logo

BigSMARTS: A Structurally-Based Line Notation for Macromolecule Search, Classification, and Reactions

POSTER

Abstract

Data-driven research, reaction retrosynthesis, and small-molecule property design are rapidly advancing due to the availability of open source toolkits like RDKit that enable search of deterministic small molecule graphs encoded in SMILES (Simplified Molecular-Input Line-Entry System) using the subgraph search syntax SMARTS (SMILES Arbitrary Target Specification). BigSMILES has extended SMILES to represent polymers as ensembles of molecular graphs, motivating a parallel extension of SMARTS to the macromolecular domain. This work discusses the significant expansion and rich complexity of searching polymers through the new search syntax BigSMARTS. BigSMARTS queries the elements of a molecular graph set containing one or more elements of a subgraph set and includes searches within the polymer’s building blocks (repeat units and end groups). BigSMARTS enables polymer reactions to be encoded and searched and coarse-graining searches to classify chain topologies (star polymers and block copolymers) that influence properties. With the development of new search tools based on RDKit to support BigSMARTS, polymer informatics will enjoy the benefits of search that have advanced small molecule design in the era of artificial intelligence.

Presenters

  • Nathan Rebello

    Massachusetts Institute of Technology MIT, Department of Chemical Engineering, Massachusetts Institute of Technology MIT

Authors

  • Nathan Rebello

    Massachusetts Institute of Technology MIT, Department of Chemical Engineering, Massachusetts Institute of Technology MIT

  • Tzyy-Shyang Lin

    Massachusetts Institute of Technology MIT

  • Bradley Olsen

    Massachusetts Institute of Technology MIT, Department of Chemical Engineering, Massachusetts Institute of Technology MIT