APS Logo

Accelerating Fusion Science with the Data Fusion Labeler (dFL): A Framework for Rapid and Reproducible Labeling of Experimental Data

POSTER

Abstract

The proliferation of machine learning (ML) applications in fusion energy science has created a critical need for tools that can efficiently generate large, high-quality labeled datasets. To address this, we created the Data Fusion Labeler (dFL), an application for the rapid exploration and labeling of multimodal, 1-D timeseries data. Deployed on the Saga system at General Atomics in collaboration with Hewlett Packard Enterprise, dFL is accessible to those in the broader fusion community who have been approved for DIII-D access and abide by the DIII-D data usage agreement. A key feature of the tool is its interoperability with TokSearch, a new data portability system letting users retrieve signals from multiple fusion devices, such as DIII-D. As a demonstration of its capability, dFL was used to generate a labeled dataset of magnetic and plasma signals from DIII-D to create classifiers to differentiate between quiescent H-mode (QH), broadband turbulent QH (BBQH), and wide pedestal QH (WPQH) plasma regimes. The platform's ability to display data in multiple formats, including timeseries and spectrograms, was crucial for accurate feature identification. dFL accelerated a previous labeling process by a factor of five. The resulting dataset successfully trained a classifier to explore the underlying physics of these plasma regimes. The dFL also promotes reproducible science by providing data purveyance via HP's common metadata framework, ensuring robust provenance for curated datasets and ML models.

Presenters

  • Mathew Waller

    Sophelio

Authors

  • Mathew Waller

    Sophelio

  • Craig Michoski

    SapientAI LLC

  • Zeyu Li

    General Atomics

  • Brian Sammuli

    General Atomics

  • Raffi M Nazikian

    General Atomics

  • David Orozco

    General Atomics

  • Martin Foltin

    Hewlett Packard Enterprise

  • Tapan Nakkina

    Sophelio