APS Logo

A Benchmark of the FHI-aims Density Functional Theory Code on the A64FX and GRACE Processors on the Ookami Testbed

ORAL

Abstract

Density functional theory (DFT) is used throughout the physical sciences. Notorious for its computational expense, the performance of DFT codes can be dependent upon the choice of processor and compiler. With the introduction of new CPUs such as the A64FX and the GRACE processors, it is important to benchmark extant DFT codes to identify bottlenecks and areas of improvement. In this work, we benchmark the representative numerical atomic orbital (NAO)-based DFT code FHI-aims on the A64FX and GRACE processors recently made available on the Ookami testbed at Stonybrook University. FHI-aims is compiled with the ARM and GNU compilers on the A64FX and with the GNU compiler on GRACE. To serve as a baseline, we also compile the code with the Intel compilers on the Intel Skylake processor available on Ookami as well as the GNU and Intel compilers on the AMD EPYC nodes available at Bridges-2 at the Pittsburgh Supercomputing Center. We also examine the effect of scalapack builds and kernel optimizations on the performance of the code. Both generalized gradient approximations and hybrid functionals are examined. The AMD, GRACE, and Intel processors perform similarly while the A64FX is in some cases an order of magnitude slower. The kernel optimizations recommended by FHI-aims to improve performance on Intel processors seem to have no effect on the calculations examined here. The choice of scalapack build appears to have little impact on the performance of the code. The GRACE processor emerges as a promising new hardware for this DFT code.

Presenters

  • Dana O'Connor

    Pittsburgh Supercomputing Center

Authors

  • Dana O'Connor

    Pittsburgh Supercomputing Center

  • Paola Buitrago

    Pittsburgh Supercomputing Center