Refactoring GENE for improved parallel scalability on current and upcoming supercomputers
POSTER
Abstract
GENE is one of the constituent codes of the WDMApp (Whole Device Modeling Application) ECP project, designated to simulate gyrokinetic microturbulence in the core of a fusion device.
Legacy GENE uses one global domain decomposition for the 6-d arrays representing distribution functions (3d configuration space, 2d velocity, 1 dim for species). The actual decomposition amongst MPI processes is determined in an auto-tuning phase. This works very well for the calculation of the main r.h.s. terms of the gyrokinetic equations -- however, these terms also involve lower-dimensional quantities like electrostatic potential and parallel vector potential that are calculated in a sequence of dimensionality-lowering
operations like integration over v-parallel, gyro-averaging, solving of the 3-d field equations etc. The domain decomposition for the 6-d domain is not necessarily optimal for these steps, so we propose a refactoring of GENE that allows individual phases of the integrator to be performed on different decompositions. This does require remapping between those decompositions, so care needs to be taken to take into account those costs.
This approach does come with further advantages -- initialization of a component can be performed on a different decomposition than the actual application of that operation in the solver, which can improve performance and avoid recomputation, especially in parts of the code where the solver is running on the GPU but initialization is not. In addition, it
allows us to have more self-contained components that can be used to investigate various options for tight and loose coupling in the context of the core-edge coupled WDMapp.
Legacy GENE uses one global domain decomposition for the 6-d arrays representing distribution functions (3d configuration space, 2d velocity, 1 dim for species). The actual decomposition amongst MPI processes is determined in an auto-tuning phase. This works very well for the calculation of the main r.h.s. terms of the gyrokinetic equations -- however, these terms also involve lower-dimensional quantities like electrostatic potential and parallel vector potential that are calculated in a sequence of dimensionality-lowering
operations like integration over v-parallel, gyro-averaging, solving of the 3-d field equations etc. The domain decomposition for the 6-d domain is not necessarily optimal for these steps, so we propose a refactoring of GENE that allows individual phases of the integrator to be performed on different decompositions. This does require remapping between those decompositions, so care needs to be taken to take into account those costs.
This approach does come with further advantages -- initialization of a component can be performed on a different decomposition than the actual application of that operation in the solver, which can improve performance and avoid recomputation, especially in parts of the code where the solver is running on the GPU but initialization is not. In addition, it
allows us to have more self-contained components that can be used to investigate various options for tight and loose coupling in the context of the core-edge coupled WDMapp.
Presenters
-
Kai Germaschewski
University of New Hampshire
Authors
-
Kai Germaschewski
University of New Hampshire
-
James McClung
University of New Hampshire
-
John Donaghy
University of New Hampshire
-
Gabriele Merlo
University of California, Los Angeles, University of Texas at Austin
-
Bryce Allen
University of Chicago
-
Frank Jenko
University of Texas at Austin, University of Texas at Austin; Max Planck Institute for Plasma Physics, Boltzmannstraße 2, 85748 Garching, Germany
-
Amitava Bhattacharjee
Princeton University, Princeton University, PPPL, Princeton Plasma Physics Laboratory, Princeton University