Scientific Data Services Framework for Plasma Physics
ORAL
Abstract
Plasma physics experiment and simulations are producing petabytes of data. Hundreds of diagnostic tools are being used with thousands of different analysis tasks on these datasets to generate scientific insight. Often I/O operations are the bottleneck in these analysis operations. This work address the I/O efficiency issue by developing techniques for common data access patterns, for deep storage hierarchies, and for massive parallelism.
Additionally, we present a thorough theoretical analysis of the data access cost to exploit the structural locality, and select the best array partitioning strategy for a given operation. In a series of performance tests on large scientific datasets, we have observed that our framework outperforms Spark by as much as 2070X on the same tasks.
–
Presenters
-
Kesheng Wu
Lawrence Berkeley National Laboratory
Authors
-
Kesheng Wu
Lawrence Berkeley National Laboratory
-
Bin Dong
Lawrence Berkeley National Laboratory
-
Surendra Byna
Lawrence Berkeley National Laboratory