APS Logo

Automated Knowledge Graph Generation from Text for Synthesis of Energetic Materials

ORAL

Abstract

The automated organization and management of knowledge in large amounts of heterogeneous data is a significant challenge in an era of data inundation. Natural language processing (NLP) techniques for information extraction may offer a solution. In this work, we use Relational Information Extraction (RIE) approaches to identify entities and their relations and propose a knowledge structure directly from unstructured text in chemistry-relevant patents and papers. The approach uses an ontology to define type hierarchies and facilitate the graph construction. The graph is shown to contain complete synthesis procedures in subgraphs, which are themselves heterogenous directed graphs. Through entity resolution, the graph is refactored in terms of general reactants and products across all synthesis procedures contained within the graph. To verify the accuracy of the knowledge represented in the automatically generated graph, queries based on functional groups and reaction mechanisms are used to find known connections between molecular fragments, reactants, products, and reaction mechanisms. Specifically, we confirm the existence of nitration, Claisen condensation, Boulton-Katritzky Rearrangement, and Diels-Alder reactions in the graph.

Presenters

  • Connor P Oryan

    University of Maryland, College Park

Authors

  • Connor P Oryan

    University of Maryland, College Park