Overcoming Communication Latency Barriers in Massively Parallel Molecular Dynamics Simulations on Anton

COFFEE_KLATCH · Invited

Abstract

Strong scaling of scientific applications on parallel architectures is increasingly limited by communication latency. This talk will describe the techniques used to reduce latency and mitigate its effects on performance in Anton, a massively parallel special-purpose machine that accelerates molecular dynamics (MD) simulations by orders of magnitude compared with the previous state of the art. Achieving this speedup required both specialized hardware mechanisms and a restructuring of the application software to reduce network latency, sender and receiver overhead, and synchronization costs. Key elements of Anton's approach, in addition to tightly integrated communication hardware, include formulating data transfer in terms of counted remote writes and leveraging fine-grained communication. Anton delivers end-to-end inter-node latency significantly lower than any other large-scale parallel machine, and the total critical-path communication time for an Anton MD simulation is less than 3{\%} that of the next-fastest MD platform.

Authors

  • Ron Dror

    D. E. Shaw Research