Scientific Data Services Framework for Plasma Physics

ORAL

Abstract

Plasma physics experiment and simulations are producing petabytes of data. Hundreds of diagnostic tools are being used with thousands of different analysis tasks on these datasets to generate scientific insight. Often I/O operations are the bottleneck in these analysis operations. This work address the I/O efficiency issue by developing techniques for common data access patterns, for deep storage hierarchies, and for massive parallelism.

Additionally, we present a thorough theoretical analysis of the data access cost to exploit the structural locality, and select the best array partitioning strategy for a given operation. In a series of performance tests on large scientific datasets, we have observed that our framework outperforms Spark by as much as 2070X on the same tasks.

Presenters

  • Kesheng Wu

    Lawrence Berkeley National Laboratory

Authors

  • Kesheng Wu

    Lawrence Berkeley National Laboratory

  • Bin Dong

    Lawrence Berkeley National Laboratory

  • Surendra Byna

    Lawrence Berkeley National Laboratory