Nested cross-validation loop for performance optimization in imbalanced problems
ORAL
Abstract
A disruption prediction algorithm based on the Random Forests Machine Learning method has been developed using large databases of both disruptive and non-disruptive discharges from EAST and Alcator C-Mod. The algorithm was trained on time samples of several physics parameters during the flattop current phase, which were cast into a binary classification scheme based on their proximity to the time of the current quench. Roughly 80% of each database is composed of non-disruptive discharges, and only a fraction of the time samples from disruptive discharges are designated as the positive (close to disruption) class. Therefore, the preponderance of negative class samples results in an imbalanced classification problem regardless of whether it is framed on a time-sample or discharge-by-discharge basis, and care must be taken to accurately measure prediction performance. This presentation describes a nested K-fold cross-validation procedure to determine an optimal mapping from the individual time sample predictions of the random forest to an alarm trigger of an impending disruption. An exploration of sampling methods and performance metrics to address the imbalance between positive and negative classes is also discussed.
–
Presenters
-
Kevin J Montes
Massachusetts Inst of Tech-MIT, MIT PSFC
Authors
-
Kevin J Montes
Massachusetts Inst of Tech-MIT, MIT PSFC
-
Cristina Rea
Massachusetts Inst of Tech-MIT, Massachusetts Inst of Tech, MIT PSFC, Massachusetts Institute of Technology
-
Robert S Granetz
Massachusetts Inst of Tech-MIT, Massachusetts Inst of Tech, MIT Plasma Science and Fusion Center, MIT PSFC
-
Roy Alexander Tinguely
MIT PSFC, Massachusetts Inst of Tech-MIT