Scaling Salt on Large Machines: Leveraging High-Performance Computing to Train ATLAS Flavor-Tagging Models
ORAL
Abstract
Salt is the software framework used to design and train neural networks like ATLAS' latest flavor-tagging model GN2. The impressive performance gains of GN2 over previous taggers are partly the result of greater model complexity and a larger training dataset. As these increase, so too do the amount of time and computation required to train the model. These present limitations to model development techniques such as a thorough hyperparameter search to tune the GN2 architecture. High-performance computing (HPC) allows for distributing workloads across many devices in parallel, reducing the overall training time. Salt is extended to use the Slurm batch manager to accomplish this. Candidate HPC systems are discussed and the results of scaling efforts on runtime and performance are presented.
–
Presenters
-
Nicholas Luongo
Argonne National Laboratory
Authors
-
Nicholas Luongo
Argonne National Laboratory