APS Logo

GPU-Acceleration of the ELPA2 Distributed Eigensolver for Applications in Electronic Structure Theory

ORAL

Abstract

The solution of eigenproblems is often a key computational bottleneck that limits the tractable system size of electronic structure theory. For large systems, these eigenproblems can easily exceed the capacity of a single computer, thus must be solved on distributed-memory parallel computers. The ELSI library facilitates large-scale electronic structure calculations by providing a unified interface to various fast and scalable eigensolvers and density matrix solvers, including the EigenExa, ELPA, libOMM, NTPoly, PEXSI, and SLEPc libraries. The ubiquitous adoption of hybrid CPU-GPU nodes in supercomputing opens up new opportunities to accelerate electronic structure calculations. We here present GPU-oriented optimizations of the ELPA two-stage tridiagonalization eigensolver (ELPA2). On top of its existing cuBLAS-based GPU offloading, we add a CUDA kernel to speed up the back-transformation of eigenvectors, which was known as the main bottleneck of the two-stage tridiagonalization algorithm. CPU, GPU, and MPI activities are overlapped wherever possible. Robust choices that maximize the GPU compute intensity are identified. We demonstrate the performance of this GPU-accelerated eigensolver by a set of benchmark calculations.

Presenters

  • Victor Yu

    Duke University, Department of Mechanical Engineering and Materials Science, Duke University

Authors

  • Victor Yu

    Duke University, Department of Mechanical Engineering and Materials Science, Duke University

  • Jonathan Moussa

    Molecular Sciences Software Institute, The Molecular Sciences Software Institute

  • Volker Blum

    Department of Mechanical Engineering and Materials Science, Duke University, Duke University, Mechanical Engineering and Material Sciences; Chemistry, Duke University