APS Logo

GPU-enabled extreme scale turbulence simulations: porting to new platforms with potential performance improvements

ORAL

Abstract

With recently published performance data on the leadership-class computer Frontier as the reference (Yeung etal, Comput. Phys. Commun., 306:109364, 2025), we consider the task of porting GPU codes to other platforms with different characteristics, while exploring new opportunities for improvements that rapid advances in hardware and software may enable. In particular, we have ported a basic version of the GESTS code from AMD MI250x GPUs on Frontier to the NVIDIA Grace Hopper (GH200) nodes on Vista at the TACC, with a focus on distributed 3D Fast Fourier Transforms which require all-to-all communication with local memory copies before and after. Runs on Vista achieve 98% of peak network bandwidth upon use of either manual MPI tuning or NVIDIA’s cuDecomp library to optimize communication protocols for a given system. Significant improvements to performance beyond all-to-all are obtained from the cuDecomp library that automates the choice of optimal domain decomposition parameters and communication backend. Faster non-strided FFTs are also facilitated by a highly performant reshape operation. Benefits seen at 8192^3 resolution on Vista are likely to extend to larger problem sizes in the future.

Presenters

  • Rohini Uma-Vaideswaran

    Georgia Tech

Authors

  • Rohini Uma-Vaideswaran

    Georgia Tech

  • Daniel L Dotson

    Georgia Institute of Technology

  • Joshua Romero

    Nvidia Corporation

  • Burlen Loring

    NVIDIA Corporation

  • David Appelhans

    NVIDIA Corporation

  • Pui-Kuen Yeung

    Georgia Institute of Technology