Developing an ML-based Surrogate Model for Plasma Boundary Prediction

Achieving precise plasma shape control is essential to the success of fusion experiments in tokamaks (for more information on plasma control please see [1, 2] and the references within). Plasma shape parameters are crucial to plasma stability, energy retention, and confinement quality. However, traditional simulation methods often face significant computational challenges due to the need to solve the Grad-Shafranov PDE iteratively, which involves a trade-off between accuracy and computational time, making these methods too slow for efficient experimentation and planning. In fusion research, scientists often need to explore a wide range of parameters and configurations to optimize plasma performance, a process that can take hours or even days, slowing down research progress.

In recent years, the fusion research community has begun exploring advanced techniques, including machine learning (ML), to address these challenges. By leveraging the ability of data-driven ML models to capture non-linear phenomena with limited input information, the speed of plasma boundary prediction can be significantly improved because of their rapid forward computations.

In this article, we will share our experience in applying an ML-based surrogate model for predicting a plasma boundary in a tokamak. Specifically, we developed this model as a faster alternative to traditional plasma solvers. This approach offers significant computation time reduction and optimizes a broad range of applications, including experiment analysis, planning, and more. We will cover the technical aspects of the model, the data used for training, and the methods we employed to optimize performance.

Key Concepts

In this article, we focus on the plasma boundary, which is defined by the last closed flux surface of the confined plasma. It represents the transition between the closed field lines in the high-temperature and high-density core region of the plasma and the open field lines representing the scrape-off layer of the tokamak.

The position of the plasma boundary, as defined by the last closed flux surface, is not directly measured in the experiments. It is provided as a result of the plasma equilibrium reconstruction based on data from sensors installed in the tokamak. These calculations are performed by specialized equilibrium reconstruction codes, such as EFIT [3, 4], most commonly used for the DIII-D discharge equilibria reconstructions. In this work, we utilized data from EFIT01, which is based only on magnetic sensor data.

The primary drawback of such calculations is that they can be extremely time-consuming, especially when many repeated experiments are needed. Surrogate modeling offers a solution to this problem. Using this technique, we adopt a data-driven approach to train a machine learning model that can provide rapid plasma boundary approximations, offering a faster alternative to directly solving the Grad-Shafranov equation.

Dataset

Through our participation in the DIII-D User Program, we gained access to a rich database containing all historical DIII-D discharges. This includes sensor data, EFIT equilibrium reconstructions, magnetic control system commands, and plasma state information, covering a wide range of experimental conditions, configurations, and control strategies. The data is recorded in high-resolution time series through an array of diagnostic systems installed in the tokamak, which collect information on key parameters such as magnetic fields, plasma temperature, density, etc. These measurements are then used to infer various plasma profiles and plasma geometry.

The input to our surrogate model includes key factors that influence the plasma state: coils’ currents, plasma current, and loop voltage. The model’s output is the geometry of the plasma boundary at the current timestamp. The coil currents, plasma current, and loop voltage are measured during the discharge, while the plasma geometry is calculated automatically after the shot using the DIII-D equilibrium code, EFIT.

Fig. 1: Сoil currents dynamics in the DIII-D discharge #164920.

Fig. 2: Plasma current and loop voltage in the DIII-D discharge #164920.

Fig. 3: Plasma shape evolution in the DIII-D discharge #164920.

For training our model, we used a similar set of 15,000+ shots, as described in [5] and [6], covering the period from June 2006 to April 2022. Any shots containing missing data were excluded from the training set. Some characteristics of the dataset are presented in Table 1.

Tab. 1: Ranges of key signals in the training dataset.

To optimize the training process, we adopt a similar preprocessing method for the signals, as outlined in [7]. Specifically, the plasma boundary is represented as a fixed-length vector of polar radius values (see a picture below).

Fig. 4: Sorting the plasma boundary points based on their polar angle (indicated by the color gradient). This creates a relationship between the radius length and the polar angle, which can be interpolated to obtain a fixed-length vector of radius values.

Model Architecture and Training

For our model we chose a transformer architecture similar to [8]. While transformers are commonly used for sequential data, their self-attention mechanism makes them well-suited for tasks with single-timestamp inputs like plasma boundary prediction. In our case, the model utilizes signals such as coils’ currents, plasma current, and loop voltage to predict the plasma boundary geometry at a specific moment. The self-attention mechanism effectively captures complex interactions between these input features, allowing the model to make accurate predictions.

To predict the plasma boundary geometry, we incorporated two separate output heads into the model’s architecture. The first head is designed to regress the R and Z coordinates of the plasma’s magnetic center, while the second head outputs a vector of boundary radius values.

Fig. 5: Model architecture from [6] with two output heads — for center and radii prediction.

To train our model, we employed the mean squared error (MSE) loss function for both the “center” and “boundary” heads, as MSE has shown good results in predicting plasma parameters in [5, 7].

For better convergence, we utilized the SGD optimizer along with One-Cycle learning rate scheduler during training. This scheduling strategy gradually increases the learning rate to a peak during the first phase of training, followed by a gradual decrease. The One-Cycle policy has been shown to improve generalization and speed up convergence by allowing the model to explore larger learning rates early on while stabilizing the optimization process in later stages. This technique helps the model avoid local minima and encourages faster and more stable training, ultimately leading to better performance across both output heads.

Model Performance

To evaluate the model’s performance, we used the mean absolute error (MAE) metric between the predicted coordinates and those reconstructed by EFIT. Below, we present two plots showing the dynamics of the MAE metric on the validation subset during the training process — one for the magnetic center and another for the plasma boundary prediction. Based on the MAE, the best model can be obtained from the checkpoint after the 46th epoch. In machine learning, an epoch refers to one complete pass through the entire training dataset during model training, allowing the model to learn and adjust its internal parameters to minimize errors. Once the model is trained, predictions can be generated immediately with a single forward pass through the model.

Fig. 6: MAE metric dynamics during model training: solid line — center prediction head, dashed line — boundary prediction head.

To further assess the model’s performance on the test set, we plotted an MAE histogram across all test shots. The results show that 95% of the shots have an MAE of less than 0.013 meters, indicating a high level of accuracy across the majority of samples.

Fig. 7: MAE metric distribution over the test shots.

Additionally, here is a video demonstration of the plasma boundary prediction for shot #164920, using the model from the 46th epoch. The video showcases the model’s performance throughout the entire shot. In the top half, a comparison is shown between the model’s predictions (in red) and the ground truth data (in blue). The bottom plot tracks the current timestamp on the shot’s timeline, overlaid with the plasma current profile. For this shot, the model achieves an MAE of 0.003 meters.

Vid. 1: Plasma boundary prediction for shot #164920, using the model from the 46th epoch.

For a more detailed analysis of the model’s performance during this shot, we plotted the boundary MAE and worst absolute error (WAE) for each timestamp across the entire shot’s timeline. This plot provides insights into how the model’s accuracy varies over different phases of the plasma discharge.

Fig. 8: MAE and WAE metrics dynamics during the test #164920 discharge.

Discussion and Conclusion

In this work, we developed an ML-based surrogate model to predict the plasma boundary and magnetic center using signals from DIII-D tokamak experiments. While our model demonstrated good overall performance in predicting the plasma boundary and the magnetic center (Fig. 9), there are some discharges where it struggles (Fig. 10). Below, we present images of plasma states where the model exhibited poor performance.

Fig. 9: Example cases where the model demonstrates accurate plasma boundary predictions.

Fig. 10: Example cases where the model struggles with plasma boundary prediction.

Most cases of poor prediction performance occur when the plasma experiences some form of instability, which can happen during the ramp-up and ramp-down phases, or even during the flattop phase. These instabilities are often visible on the plasma current plot. Another reason for the poor performance is that the model does not account for the geometry of the tokamak’s limiter, which physically bounds the plasma. Additionally, incorporating data from probes and loops could provide the model with more information about the plasma shape. However, this approach might reduce the model’s applicability, as in some cases, only the initial discharge configuration and actuator actions are available.

A more desirable approach to improving the model would be to enable it to process a sequence of plasma states from the very beginning of the discharge. This would allow the model to have an internal representation of the plasma state, providing sufficient information to predict not only the plasma boundary but also the full magnetic field distribution inside the vessel. Recurrent architectures like RNNs, LSTMs, or GRUs could be explored for this purpose. Future work could also focus on extending the model’s capabilities to predict plasma boundaries at future timestamps.

Currently, the surrogate model significantly reduces computation time. The model can be run on a CPU, taking less than 2 seconds to process a shot with ~7,000 timestamps on an AMD EPYC 7402P 24-Core Processor, while an NVIDIA GeForce RTX 3090 GPU takes less than 1 second. This speed advantage allows for more efficient experiment planning, real-time data analysis, and could also support reinforcement learning algorithms that rely on rapid simulations to optimize plasma control strategies.

Fusion Twin Platform

You can run the surrogate model and see the results in action on the Fusion Twin Platform, https://fusiontwin.io/.

The Platform is the online service for running highly customizable magnetic equilibrium simulations using NSFsim and digital replicas of DIII-D, ISTTOK, NSF NTT, and other tokamaks, visualizing uploaded and simulated files in HDF5 format, and interacting with the data using integrated Jupyter notebooks. See our recent blog post for more information about the Platform.

We invite you to be part of this groundbreaking journey. Follow our blog, subscribe to our LinkedIn for regular updates, or reach out to us directly to discuss potential collaborations.

References

[1] Ambrosino, G., and R. Albanese. “Magnetic Control of Plasma Current, Position, and Shape in Tokamaks: A Survey or Modeling and Control Approaches.” IEEE Control Systems 25, no. 5 (October 2005): 76–92. https://doi.org/10.1109/mcs.2005.1512797.
[2] Korenev, P., A. Prokhorov, N. Kartsev, M. Patrov, A. E., Pavlova, Y. Mitrishkin, et al. “Plasma Control in Tokamaks. Part. 2. Plasma Magnetic Control Systems,” 2019. https://www.semanticscholar.org/paper/Plasma-Control-in-Tokamaks.-Part.-2.-Plasma-Control-Korenev-Prokhorov/39e6db1d8ecdbb494285ba9a8033a6a9e18bb96b.
[3] L L Lao, H St John, R D Stambaugh, A G Kellman, and W Pfeiffer. Reconstruction of current profile parameters and plasma shapes in tokamaks. Nuclear Fusion, 25(11):1611, 1985.
[4] L L Lao, S Kruger, C Akcay, P Balaprakash, T A Bechtel, E Howell, J Koo, J Leddy, M Leinhauser, Y Q Liu, S Madireddy, J McClenaghan, D Orozco, A Pankin, D Schissel, S Smith, X Sun, and S Williams. Application of machine learning and artificial intelligence to extend EFIT equilibrium reconstruction. Plasma Physics and Controlled Fusion, 64(7), 074001, 2022.
[5] Abbate, J., Conlin, R., & Kolemen, E. (2021). Data-driven profile prediction for DIII-D. Nuclear Fusion, 61(4), 046027, 2021.
[6] Char, I., Abbate, J., Bardóczi, L., Boyer, M., Chung, Y., Conlin, R., … & Schneider, J. (2023, June). Offline model-based reinforcement learning for tokamak control. In Learning for Dynamics and Control Conference (pp. 1357–1372). PMLR.
[7] Wan, Chenguang, et al. “Predict the last closed-flux surface evolution without physical simulation.” Nuclear Fusion 64.2 (2024): 026014.
[8] Wan, C., Yu, Z., Pau, A., Sauter, O., Liu, X., Yuan, Q., & Li, J. (2023). A machine-learning-based tool for last closed-flux surface reconstruction on tokamaks. Nuclear Fusion, 63(5), 056019.

Acknowledgment

This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Fusion Energy Sciences, using the DIII-D National Fusion Facility, a DOE Office of Science user facility, under Award(s) DE-FC02–04ER54698.

Disclaimer

This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.