APS Logo

Scenario Optimization for NSTX-U via Reinforcement Learning

POSTER

Abstract

Reliably achieving desired operating regimes using available actuators is crucial for nuclear fusion to become a viable energy source. In this work, a Reinforcement Learning (RL) agent is trained on the plasma simulation code COTSIM (Control Oriented Transport SIMulator) to determine optimal actuator trajectories for reaching target plasma regimes. These regimes are characterized by high normalized beta and a significant fraction of noninductive current drive, aligning with the core operational objectives of NSTX-U. During training, the target is systematically varied, enabling a single agent to reach multiple unique targets within a given parameter space. This allows for fast optimizations between experimental shots that can adapt to varying circumstances. To discourage the agent from exploring regimes associated with plasma instabilities, nonlinear constraints are incorporated within the reward function via penalty terms. Additionally, an actuator mask is used to accommodate optimizations of different dimensionality and to model potential actuator failures. Multiple RL architectures are evaluated for their effectiveness, flexibility, and speed, and are compared with several other gradient-based and non-gradient based optimization methods.

Presenters

  • Brian Robert Leard

    Lehigh University

Authors

  • Brian Robert Leard

    Lehigh University

  • Sai Tej Paruchuri

    Lehigh University

  • Tariq Rafiq

    Lehigh University

  • Eugenio Schuster

    Lehigh University