Data Augmentation and Pre-training for Template-Based Retrosynthetic Prediction

Mike Fortunato; Connor Coley; Brian Barnes; Klavs Jensen

Data Augmentation and Pre-training for Template-Based Retrosynthetic Prediction

ORAL

Abstract

A key step in computer-aided synthesis planning (CASP) is the prioritization of candidate molecular transformations for retrosynthetic analysis. Recent methods obtaining state-of-the-art accuracy have used machine learning (ML) models as recommendation engines to rank reaction templates extracted from databases of recorded reactions. However, data scarcity limits the ability for ML models to recommend rare, often highly desired, transformations. In this work we discuss the augmentation of open-access reaction databases with synthetically generated molecular transformations to teach neural networks generalized template applicability. We use this as a pre-training strategy, which is followed by fine tuning of the model parameters using true, recorded reactions, in order to increase the diversity of suggested retrosynthetic transformations. While previous methods have focused on learning a one-to-one-mapping from featurized molecular inputs to a single template transformation, pre-training with general template applicability allows these new models to learn a one-to-many mapping to multiple templates. The implications of performing data augmentation and pre-training on different sized datasets is discussed, as well as the changes in performance for rare reaction templates.

March 3, 2020, 3:51 PM – March 3, 2020, 4:03 PM

Presenters

Mike Fortunato

Department of Chemical Engineering, Massachusetts Institute of Technology

Authors

Mike Fortunato

Department of Chemical Engineering, Massachusetts Institute of Technology
Connor Coley

Department of Chemical Engineering, Massachusetts Institute of Technology
Brian Barnes

Army Research Laboratory, Detonation Science and Modeling Branch, CCDC Army Research Laboratory, CCDC Army Research Laboratory, US Army Rsch Lab - Aberdeen
Klavs Jensen

Department of Chemical Engineering, Massachusetts Institute of Technology