Efficient GPU implementation for implicit particle-in-cell simulations

G. Chen; Luis Chacon; D.C. Barnes

Efficient GPU implementation for implicit particle-in-cell simulations

POSTER

Abstract

A recent proof-of-principle study of an energy- and charge-conserving, fully implicit particle-in-cell (PIC) algorithm\footnote{G. Chen, L. Chac\'on, and D.C. Barnes, {\it J. Comput. Phys.} {\bf 18}(2011).} demonstrated that accurate and efficient PIC simulations with very large time steps are possible. A key component of the algorithm is the enslavement of particle orbits to the field equations. With particle enslavement, orbit integration is a segregated operation, which is perfectly suited for emerging heterogeneous architectures that combine CPUs with GPUs. The use of GPUs is promising on implicit PIC, as it is naturally data parallel (thus suited for extreme multi-threading), and it is compute-bounded (vs. explicit schemes, typically memory-bounded). However, the particle mover in [1] is adaptive, and particles have to stop at cell-boundaries to conserve charge locally. This creates load imbalances and dynamic control flows, which poses a challenge to utilize fully the GPU computing power. This work demonstrates that a highly efficient GPU implementation of the implicit particle mover (using CUDA) is possible. We obtain 300 to 400 GOps/s (counting floating, integer and special function operations) using single precision. This is about 20\% to 25\% of the peak performance of the GPU (GeForce GTX580), and about 200 to 300 times faster than a single CPU (Xeon@3.16GHz) implementation.

Authors

G. Chen

Oak Ridge National Laboratory
Luis Chacon

ORNL, Oak Ridge Nat. Lab., Oak Ridge National Laboratory
D.C. Barnes

Coronado Consulting