Using CycleGANs to construct training data for other Machine Learning models

Abid A Khan; Chia-Hao Lee; Pinshane Y Huang; Bryan K Clark

Using CycleGANs to construct training data for other Machine Learning models

ORAL

Abstract

Supervised machine learning (ML) has found its way into the scientific community proving to be incredibly useful for analyzing and classifying large datasets. Constructing these useful ML models, however, requires large amounts of training data that usually comes from experiments. Often, this data requires tedious labeling, partially defeating the purpose of ML models in the first place. Simulation data on the other hand, is usually more efficient to obtain and already comes prelabeled. However, these simulated images are often limited by the oversimplified model and deviate from the experimental images, limiting the accuracy and precision of ML training. We present an approach to generating "experimental"-like data by employing a cycleGAN to automatically add realistic features and noise profiles to simulated data. We specifically use data from scanning tunneling electron microscopy (STEM) and show how ML models better evaluate experimental data when trained with data generated from a cycleGAN.

March 7, 2023, 8:24 PM – March 7, 2023, 8:36 PM

Presenters

Abid A Khan

University of Illinois at Urbana-Champai

Authors

Abid A Khan

University of Illinois at Urbana-Champai
Chia-Hao Lee

University of Illinois at Urbana-Champaign
Pinshane Y Huang

University of Illinois at Urbana-Champaign
Bryan K Clark

University of Illinois at Urbana-Champaign