Data Labeling for Machine Learning Applications in Fusion Energy

POSTER

Abstract

To advance the pursuit of fusion energy, companies and organizations need to collect and process vast volumes of data from experiments and simulations. In this work, we have developed a data labeling application allowing for labeling data at least five times faster than comparable previous attempts at labeling time series from tokamak results. The example application is used in labeling plasma modes in experimental DIII-D shots; namely: H-modes (ELMy H-modes), Broadband Quasi-Harmonic Modes (BBQH Modes), Quiescent H-modes (QH Modes) and Wide Pedestal Quiescent H-modes (WPQH Modes). The labeling process allows for organization and visualization of dozens of custom (or existing) signals from individual runs of the DIII-D tokamak, and the simultaneous examination of spectrogram plots, etc. Subsequently, the accuracy of machine learning (ML)-assisted labeling increased proportionally to the amount of data used in the pre-labeled training set. The design of the labeling tool also allowed for integration with a variety of technical backends, from SQL-based, to file persisted datasets in hierarchical directories. The tool lends itself to flexible use cases and advanced customization with custom plotting, smoothing, and custom normalization of time series signals and other datatypes hereto not served by existing tooling, which enables diagnoses and exploration in a variety of fields and tasks related to the flourishing of fusion energy.

Presenters

  • Craig Michoski

    Sapientai

Authors

  • Craig Michoski

    Sapientai

  • Matthew Waller

    Sapientai

  • Zeyu Li

    General Atomics

  • Brian Sammuli

    General Atomics

  • Raffi M Nazikian

    General Atomics

  • Ruqi Pei

    Sapientai

  • David Orozco

    General Atomics

  • Venkitesh Ayyar

    Sapientai