A customised implementation of implicit high-order compact finite difference schemes in Xcompact3d targeting heterogeneous architectures
ORAL
Abstract
Implicit high-order finite difference schemes have significant advantages over low-order schemes when simulating turbulent flows. However, they result in banded tridiagonal systems that are hard to solve in distributed memory environments. The authors have developed a new algorithm for solving tridiagonal systems on distributed heterogeneous architectures. The customised algorithm utilises a specialist data structure that results in a linear data access pattern for maximising the bandwidth throughput, enables vectorisation on CPUs and thread level parallelism on GPUs in all spatial directions, and reduces the data movements between chip and main memory via a combination of cache blocking and fusion strategies. Additionally, the customised algorithm takes advantage of the diagonal dominance of the tridiagonal systems resulting from high-order implicit schemes, by reducing the communication requirements significantly, with pseudo-local communications only between neighbouring subdomains. This new algorithm has been implemented in Xcompact3d, a suite of flow solvers dedicated to the study of turbulent flows. The potential and performance of the new algorithm will be shown with simulations of turbulent flows performed with Xcompact3d on CPU and GPUs.
–
Publication: A Distributed Memory Tridiagonal Solver Based on a Specialist Data Structure Optimised for CPU and GPU Architectures (In progress, planning to submit to CPC in August 2024)
Presenters
-
Semih Akkurt
Imperial College London
Authors
-
Semih Akkurt
Imperial College London
-
Sebastien Lemaire
EPCC, The University of Edinburgh
-
Paul Bartholomew
EPCC, The University of Edinburgh
-
Jacques Xing
Imperial College London
-
Sylvain Laizet
Imperial College London, Department of Aeronautics, Imperial College London