APS Logo

Efficient GPU parallelization of first-principles electron-phonon calculations

ORAL

Abstract

Developing scalable software that leverages exascale computing is critically important for first-principles calculations. However, for widely employed workflows focusing on electron-phonon interactions and related transport and nonequilibrium dynamics, taking advantage of GPU hardware remains challenging. In this talk, we show an efficient GPU parallel implementation of electron-phonon algorithms employing data structures and code optimized for GPUs. We target both transport and nonequilibrium dynamics calculations in the Boltzmann equation formalism, and achieve a significant performance improvement with a range of strategies, including grouping and sorting contributions from different scattering processes and rewiring key data structures. Benchmark tests for several materials on one GPU node with four NVIDIA A100 GPUs (40GB) demonstrate a remarkable 40x speedup over the original CPU-based implementation on one AMD EPYC 7763 processor with 64 cores. Additionally, the new implementation exhibits nearly ideal strong scaling up to 32 GPUs, with only a slight decrease in performance up to 64 GPUs, while requiring only a small memory overhead. The talk will also discuss details of the OpenACC implementation in the Perturbo code, as well as building the code on advanced supercomputers using the nvfortran compiler.

Presenters

  • Shiyu Peng

    Caltech

Authors

  • Shiyu Peng

    Caltech

  • Donnie Pinkston

    Caltech

  • Jia Yao

    Caltech

  • Sergei Kliavinek

    Caltech

  • Ivan Maliyov

    EPFL, CNRS, Aix-Marseille Universite, Caltech

  • Marco Bernardi

    Caltech