Exploring novel algorithmic strategies for optimal performance of discontinuous Galerkin-based flow solvers on GPU-based systems
ORAL
Abstract
Heterogeneous architectures, particularly those based on graphics processsing units (GPUs), are becoming increasingly prevalent in high-performance computing. However, GPU programming is challenging, and requires a different approach to achieve optimal performance than traditional CPU-based implementations. In this talk, we present a case study of a computational fluid dynamics (CFD) application based on the high-order discontinuous Galerkin (DG) method. While the DG method is well-suited for parallelization, the surface term computations for the gradients and fluxes involved in the method present a significant performance bottleneck on GPUs due to the non-contiguous memory accesses. To address this issue, we explore novel performance strategies such as utilization of shared memory, intermediate memory coalescing, and use of alternate data layouts for the element nodes. The implementation and performance metrics for these strategies will be assessed across different GPU architectures, using the OCCA portability framework. The overall speedup of the application for different polynomial orders will also be evaluated and presented. The goal is to investigate novel approaches for optimal performance of DG-based CFD applications on GPUs, which will contribute to the advancement of scientific studies using CFD simulations on modern supercomputers.
–
Presenters
-
Umesh Unnikrishnan
Argonne National Laboratory
Authors
-
Umesh Unnikrishnan
Argonne National Laboratory
-
Kris Rowe
Argonne National Laboratory
-
Saumil S Patel
Argonne National Laboratory