Physics based optimization of Particle-in-Cell simulations on GPUs
POSTER
Abstract
We present progress in improving the performance of the gyrokinetic particle-in-cell (PIC) code XGC-1 on NVIDIA GPUs, as well as enhancements made to portability and developer productivity using OpenACC directives. Increasingly simulation codes are required to use heterogeneous accelerator resources on the most powerful supercomputing systems. PIC methods are well suited to these massively parallel accelerator architectures, as particles can largely be advanced independently within a time-step. Their advance must still, however, reference field data on underlying grid structures, which presents a significant performance bottleneck. Even ported to GPUs using CUDA Fortran, the XGC-1 electron push routine accounts for a significant portion of the code execution time. By applying physical insight to the motion of electrons across the device (and therefore field grids) we have developed techniques that increase performance of this kernel by up to 5X, compared to the original CUDA Fortran implementation. Architecture specific optimizations can be isolated in small `leaf' routines, which allows for a portable OpenACC implementation that performs nearly as well as the optimized CUDA.
Authors
-
Stephen Abbott
Oak Ridge National Laboratory
-
Ed D'Azevedo
ORNL, Oak Ridge National Laboratory