GPU-enabled extreme scale turbulence simulations: porting to new platforms with potential performance improvements

Rohini Uma-Vaideswaran; Daniel L Dotson; Joshua Romero; Burlen Loring; David Appelhans; Pui-Kuen Yeung

GPU-enabled extreme scale turbulence simulations: porting to new platforms with potential performance improvements

ORAL

Abstract

With recently published performance data on the leadership-class computer Frontier as the reference (Yeung etal, Comput. Phys. Commun., 306:109364, 2025), we consider the task of porting GPU codes to other platforms with different characteristics, while exploring new opportunities for improvements that rapid advances in hardware and software may enable. In particular, we have ported a basic version of the GESTS code from AMD MI250x GPUs on Frontier to the NVIDIA Grace Hopper (GH200) nodes on Vista at the TACC, with a focus on distributed 3D Fast Fourier Transforms which require all-to-all communication with local memory copies before and after. Runs on Vista achieve 98% of peak network bandwidth upon use of either manual MPI tuning or NVIDIA’s cuDecomp library to optimize communication protocols for a given system. Significant improvements to performance beyond all-to-all are obtained from the cuDecomp library that automates the choice of optimal domain decomposition parameters and communication backend. Faster non-strided FFTs are also facilitated by a highly performant reshape operation. Benefits seen at 8192^3 resolution on Vista are likely to extend to larger problem sizes in the future.

March 17, 2025, 3:18 PM – March 17, 2025, 3:30 PM

Presenters

Rohini Uma-Vaideswaran

Georgia Tech

Authors

Rohini Uma-Vaideswaran

Georgia Tech
Daniel L Dotson

Georgia Institute of Technology
Joshua Romero

Nvidia Corporation
Burlen Loring

NVIDIA Corporation
David Appelhans

NVIDIA Corporation
Pui-Kuen Yeung

Georgia Institute of Technology