Efficient GPU parallelization of first-principles electron-phonon calculations
ORAL
Abstract
Developing scalable software that leverages exascale computing is critically important for first-principles calculations. However, for widely employed workflows focusing on electron-phonon interactions and related transport and nonequilibrium dynamics, taking advantage of GPU hardware remains challenging. In this talk, we show an efficient GPU parallel implementation of electron-phonon algorithms employing data structures and code optimized for GPUs. We target both transport and nonequilibrium dynamics calculations in the Boltzmann equation formalism, and achieve a significant performance improvement with a range of strategies, including grouping and sorting contributions from different scattering processes and rewiring key data structures. Benchmark tests for several materials on one GPU node with four NVIDIA A100 GPUs (40GB) demonstrate a remarkable 40x speedup over the original CPU-based implementation on one AMD EPYC 7763 processor with 64 cores. Additionally, the new implementation exhibits nearly ideal strong scaling up to 32 GPUs, with only a slight decrease in performance up to 64 GPUs, while requiring only a small memory overhead. The talk will also discuss details of the OpenACC implementation in the Perturbo code, as well as building the code on advanced supercomputers using the nvfortran compiler.
–
Presenters
-
Shiyu Peng
Caltech
Authors
-
Shiyu Peng
Caltech
-
Donnie Pinkston
Caltech
-
Jia Yao
Caltech
-
Sergei Kliavinek
Caltech
-
Ivan Maliyov
EPFL, CNRS, Aix-Marseille Universite, Caltech
-
Marco Bernardi
Caltech