Task Graph Scheduler: A Library for Dynamic Runtime Scheduling in MPI Applications

Hilario C Torres; Scott Murman

Task Graph Scheduler: A Library for Dynamic Runtime Scheduling in MPI Applications

ORAL

Abstract

The current state-of-the-practice for Single-Program, Multiple-Data (SPMD) applications utilizes a bulk-synchronous paradigm (BSP) implemented with non-blocking Message Passing Interface (MPI) communication calls. In this paradigm, the order of execution of the computational kernels is hard coded at compile time in order to overlap communication and computation in a synchronized fashion. In simple applications this approach is relatively easy to implement and can provide sufficient parallel scalability. However, it is difficult to specify a performant schedule at compile time for applications that simultaneously run multiple interdependent algorithms on a diverse set of data structures. This presentation covers a library that we have developed to solve this problem by dynamically scheduling computational kernels at runtime using directed acyclic graphs to track the data dependencies between kernels. This system is specifically designed to leverage existing computational infrastructure as much as possible, facilitating the extension to legacy applications. This scheduling system is demonstrated using the eddy high-order multi-physics solver developed at NASA. Details regarding the implementation, our experiences using this system, and performance will be discussed.

Nov. 19, 2023, 3:13 PM – Nov. 19, 2023, 3:26 PM

Presenters

Hilario C Torres

NASA Ames Research Center

Authors

Hilario C Torres

NASA Ames Research Center
Scott Murman

NASA Ames Research Center