Algorithm for computing descriptive statistics for very large data sets and the exa-scale era
ORAL
Abstract
An algorithm for Single-point, Parallel, Online, Converging Statistics (SPOCS) is presented. It is suited for \emph{in situ} analysis that traditionally would be relegated to post-processing, and can be used to monitor the statistical convergence and estimate the error/residual in the quantity---useful for uncertainty quantification too. Today, data may be generated at an overwhelming rate by numerical simulations and proliferating sensing apparatuses in experiments and engineering applications. Monitoring descriptive statistics in real time lets costly computations and experiments be gracefully aborted if an error has occurred, and monitoring the level of statistical convergence allows them to be run for the shortest amount of time required to obtain good results. This algorithm extends work by P\'{e}bay (Sandia Report SAND2008-6212). P\'{e}bay's algorithms are recast into a converging delta formulation, with provably favorable properties. The mean, variance, covariances and arbitrary higher order statistical moments are computed in one pass. The algorithm is tested using Sillero, Jim\'{e}nez,~\&~Moser's~(2013, 2014) publicly available UPM high Reynolds number turbulent boundary layer data set, demonstrating numerical robustness, efficiency and other favorable properties.
–
Authors
-
Izaak Beekman
ParaTools Inc.