New Informatics Tools to Help Make Polymer Data Bigger
ORAL · Invited
Abstract
Data-driven research has the potential to change the way that we explore the world around us and solve new problems, but in polymer science we are limited by the way we handle data. Challenges in communicating what we have made and a lack of widely-adopted standard representations for polymer data make it difficult to bring together the small, disparate data sets that have been performed, limiting our databases to a comparatively small number of curated data sets. Taking inspiration from biology and small molecule chemistry where there has been rapid recent progress, we aim to develop new informatics tools that will enable the organization, search, sharing, and widespread use of polymer data to accelerate discovery and innovation. First, we have developed standardized line notation representations for chemical structure of polymers, called BigSMILES. Through canonicalization rules and extension to non-covalent chemistries, these cover a wide range of different polymer materials. These innovations in structure representation directly enable the development of a new search language, BigSMARTS. BigSMARTS uses graph-based search, like in small organic molecule search, and is able to address the challenge of stochastic structures in polymers by searching over the graphs of molecular generating functions rather than over the molecules themselves. Finally, we demonstrate how these tools can be applied to accelerate data-driven research, including the development of machine learning models for block copolymers and the synthesis of chemically diverse libraries of polymers for high-throughput property characterization.
–
Presenters
-
Bradley D Olsen
Massachusetts Institute of Technology MI, Massachusetts Institute of Technology, Massachusetts Institute of Technology MIT
Authors
-
Bradley D Olsen
Massachusetts Institute of Technology MI, Massachusetts Institute of Technology, Massachusetts Institute of Technology MIT