Near real-time streaming analysis of big fusion data
POSTER
Abstract
Fusion plasma diagnostics, such as electron-cyclotron emission imaging
(ECEI) diagnostics, routinely generate fast,
high-dimensional data-streams, typically of the order of Gigabytes per
second. Future devices, like ITER, are predicted to generate
multiple petabytes of measurement data per day. Such large datasets
can not be analyzed manually. Furthermore, interested
parties in the analysis results are scattered around the globe. To
address these issues, we are developing the Delta
(aDaptive nEar-raL Time Analysis framework) - a python framework that
allows to stream measurement data to a remote
compute center, perform data analysis using distributed compute
resources, and display visualizations of the analyzed
data on a web-based dashboard. In this contribution we demonstrate the use-case where we stream ECEi
measurements taken at the KSTAR tokamak in Korea
to the NERSC compute center in California. Using Delta, we achieve a
bandwidth of over 500 MB/seconds and perform
a turbulence analysis of the entire dataset in under 5 minutes. The
analyzed data can be presented in near real-time on a
web-based dashboard. Finally, we discuss how machine learning-based
classifiers can be used in Delta to automatically target data
analysis routines to relevant subsets of the data stream.
(ECEI) diagnostics, routinely generate fast,
high-dimensional data-streams, typically of the order of Gigabytes per
second. Future devices, like ITER, are predicted to generate
multiple petabytes of measurement data per day. Such large datasets
can not be analyzed manually. Furthermore, interested
parties in the analysis results are scattered around the globe. To
address these issues, we are developing the Delta
(aDaptive nEar-raL Time Analysis framework) - a python framework that
allows to stream measurement data to a remote
compute center, perform data analysis using distributed compute
resources, and display visualizations of the analyzed
data on a web-based dashboard. In this contribution we demonstrate the use-case where we stream ECEi
measurements taken at the KSTAR tokamak in Korea
to the NERSC compute center in California. Using Delta, we achieve a
bandwidth of over 500 MB/seconds and perform
a turbulence analysis of the entire dataset in under 5 minutes. The
analyzed data can be presented in near real-time on a
web-based dashboard. Finally, we discuss how machine learning-based
classifiers can be used in Delta to automatically target data
analysis routines to relevant subsets of the data stream.
Presenters
-
Ralph Kube
Princeton Plasma Physics Laboratory, PPPL
Authors
-
Ralph Kube
Princeton Plasma Physics Laboratory, PPPL
-
Michael Churchill
Princeton Plasma Physics Laboratory
-
Jong Choi
Oak Ridge National Laboratory
-
Jason Wang
Oak Ridge National Laboratory
-
Laurie Stephey
Lawrence Berkeley National Laboratory
-
Choongseok Chang
Princeton Plasma Physics Laboratory, Princeton Plasma Physics Laboratory, Princeton University
-
Scott Klasky
Oak Ridge National Laboratory