Scenario Optimization for NSTX-U via Reinforcement Learning

Brian Robert Leard; Sai Tej Paruchuri; Tariq Rafiq; Eugenio Schuster

Scenario Optimization for NSTX-U via Reinforcement Learning

POSTER

Abstract

Reliably achieving desired operating regimes using available actuators is crucial for nuclear fusion to become a viable energy source. In this work, a Reinforcement Learning (RL) agent is trained on the plasma simulation code COTSIM (Control Oriented Transport SIMulator) to determine optimal actuator trajectories for reaching target plasma regimes. These regimes are characterized by high normalized beta and a significant fraction of noninductive current drive, aligning with the core operational objectives of NSTX-U. During training, the target is systematically varied, enabling a single agent to reach multiple unique targets within a given parameter space. This allows for fast optimizations between experimental shots that can adapt to varying circumstances. To discourage the agent from exploring regimes associated with plasma instabilities, nonlinear constraints are incorporated within the reward function via penalty terms. Additionally, an actuator mask is used to accommodate optimizations of different dimensionality and to model potential actuator failures. Multiple RL architectures are evaluated for their effectiveness, flexibility, and speed, and are compared with several other gradient-based and non-gradient based optimization methods.

Presenters

Brian Robert Leard

Lehigh University

Authors

Brian Robert Leard

Lehigh University
Sai Tej Paruchuri

Lehigh University
Tariq Rafiq

Lehigh University
Eugenio Schuster

Lehigh University