APS Logo

MaterialEyes: Utilizing literature to characterize materials from images

ORAL

Abstract

Due to recent improvements, materials microscopy is experiencing an explosion of published imaging data. The standard publication format, while sufficient for traditional studies, is not conducive to large-scale data aggregation or analysis, hindering data sharing and reuse. In the MaterialEyes project, we utilize computer vision and natural language processing tools to leverage materials characterization data in scientific literature. We develop the EXSCLAIM Python toolkit [1] for the automatic EXtraction, Separation, and Caption-based natural Language Annotation of IMages from scientific literature [2]. We discuss the construction of EXSCLAIM [3] and demonstrate its ability to extract and label open-source scientific images at high volume. To further exploit the constructed dataset of the EXSCLAIM pipeline, we focus on two subsequent tasks: (1) a hybrid image retrieval system to measure both the visual similarity and scale similarity between microscopy images crawled from the literature, so that we may use the caption text to interpret the query image; (2) extracting spectra data from spectroscopy plots in an automatic fashion, in which we develop the Plot2Spectra tool [4] to locate the position of axes, recognize the ticks, and extract the plot lines.

Publication: [1] https://github.com/MaterialEyes/exsclaim<br>[2] E Schwenker, W Jiang, T Spreadbury, N Ferrier, O Cossairt, MKY Chan, "EXSCLAIM!--An automated pipeline for the construction of labeled materials imaging datasets from literature," arXiv preprint arXiv:2103.10631. <br>[3] W Jiang, E Schwenker, T Spreadbury, N Ferrier, MKY Chan, O Cossairt, "A Two-stage Framework for Compound Figure Separation," 2021 IEEE International Conference on Image Processing (ICIP), DOI: 10.1109/ICIP42928.2021.9506171.<br>[4] W Jiang, E Schwenker, T Spreadbury, K Li, MKY Chan, O Cossairt, "Plot2Spectra: an Automatic Spectra Extraction Tool," arXiv preprint arXiv:2107.02827.

Presenters

  • Weixin Jiang

    Northwestern University

Authors

  • Weixin Jiang

    Northwestern University

  • Eric Schwenker

    Argonne National Laboratory

  • Trevor Spreadbury

    Argonne National Laboratory

  • Oliver Cossairt

    Northwestern University

  • Maria K Chan

    Argonne National Laboratory