Accelerating Fusion Science with the Data Fusion Labeler (dFL): A Framework for Rapid and Reproducible Labeling of Experimental Data
POSTER
Abstract
The proliferation of machine learning (ML) applications in fusion energy science has created a critical need for tools that can efficiently generate large, high-quality labeled datasets. To address this, we created the Data Fusion Labeler (dFL), an application for the rapid exploration and labeling of multimodal, 1-D timeseries data. Deployed on the Saga system at General Atomics in collaboration with Hewlett Packard Enterprise, dFL is accessible to those in the broader fusion community who have been approved for DIII-D access and abide by the DIII-D data usage agreement. A key feature of the tool is its interoperability with TokSearch, a new data portability system letting users retrieve signals from multiple fusion devices, such as DIII-D. As a demonstration of its capability, dFL was used to generate a labeled dataset of magnetic and plasma signals from DIII-D to create classifiers to differentiate between quiescent H-mode (QH), broadband turbulent QH (BBQH), and wide pedestal QH (WPQH) plasma regimes. The platform's ability to display data in multiple formats, including timeseries and spectrograms, was crucial for accurate feature identification. dFL accelerated a previous labeling process by a factor of five. The resulting dataset successfully trained a classifier to explore the underlying physics of these plasma regimes. The dFL also promotes reproducible science by providing data purveyance via HP's common metadata framework, ensuring robust provenance for curated datasets and ML models.
Presenters
-
Mathew Waller
Sophelio
Authors
-
Mathew Waller
Sophelio
-
Craig Michoski
SapientAI LLC
-
Zeyu Li
General Atomics
-
Brian Sammuli
General Atomics
-
Raffi M Nazikian
General Atomics
-
David Orozco
General Atomics
-
Martin Foltin
Hewlett Packard Enterprise
-
Tapan Nakkina
Sophelio