APS Logo

Memory and Performance Optimization of a GPU Parallelized PIC-MCC Simulator

POSTER

Abstract

Plasma simulations are widely used for semiconductor manufacturing processes such as atomic layer deposition (ALD) [1], physical vapor deposition (PVD) [2], and others. Our in-house GPU parallelized Particle-in-Cell Monte Carlo collision (PIC-MCC) program called "PiCHY" [3] was developed in order to simulate a reactor used in actual processes. Initial implementations of PiCHY treated each mesh cell as an independent region, but the non-uniform spread of particles in the simulation space has led to load balancing problems. To address this issue, we have introduced a grouping of physically adjacent cells, which we call a "cluster", that are processed by the GPU as a single unit. This has allowed us to more optimally assign GPU threads, make better use of high speed shared memory for mesh information, and reduce the number of memory transfers for moving particles. While the clusters themselves are sorted, the particles within a cluster are not. The greatest performance improvement has been to the functions handling particle movement, which greatly outweighs the minor increases to bookkeeping costs and added complexity. Therefore, we have observed a 1.5x speedup with the GEC reference case.

Publication: [1] K Denpoh, P Moroz, T Kato, and M Matsukuma, Jpn. J. Appl. Phys. 59, SHHB02 (2020).<br>[2] J. T. Gudmundsson, Plasma Sources Sci. Technol. 29, 113001 (2020).<br>[3] J. S. Kim, K. Denpoh, M. Anderson, and M. Matsukuma, 77th Annual Gaseous Electrics Conference, DF.100003 (2024).

Presenters

  • Matthew Anderson

    Tokyo Electron Technology Solutions Limited, Tokyo Electron Technology Solutions Ltd.

Authors

  • Matthew Anderson

    Tokyo Electron Technology Solutions Limited, Tokyo Electron Technology Solutions Ltd.

  • Kim Jinseok

    Tokyo Electron Technology Solutions Limited

  • Masaaki Matsukuma

    Tokyo Electron Technology Solutions Limited