GPU-enabled extreme scale turbulence simulations: porting to new platforms with potential performance improvements
ORAL
Abstract
With recently published performance data on the leadership-class computer Frontier as the reference (Yeung etal, Comput. Phys. Commun., 306:109364, 2025), we consider the task of porting GPU codes to other platforms with different characteristics, while exploring new opportunities for improvements that rapid advances in hardware and software may enable. In particular, we have ported a basic version of the GESTS code from AMD MI250x GPUs on Frontier to the NVIDIA Grace Hopper (GH200) nodes on Vista at the TACC, with a focus on distributed 3D Fast Fourier Transforms which require all-to-all communication with local memory copies before and after. Runs on Vista achieve 98% of peak network bandwidth upon use of either manual MPI tuning or NVIDIA’s cuDecomp library to optimize communication protocols for a given system. Significant improvements to performance beyond all-to-all are obtained from the cuDecomp library that automates the choice of optimal domain decomposition parameters and communication backend. Faster non-strided FFTs are also facilitated by a highly performant reshape operation. Benefits seen at 8192^3 resolution on Vista are likely to extend to larger problem sizes in the future.
–
Presenters
-
Rohini Uma-Vaideswaran
Georgia Tech
Authors
-
Rohini Uma-Vaideswaran
Georgia Tech
-
Daniel L Dotson
Georgia Institute of Technology
-
Joshua Romero
Nvidia Corporation
-
Burlen Loring
NVIDIA Corporation
-
David Appelhans
NVIDIA Corporation
-
Pui-Kuen Yeung
Georgia Institute of Technology