Science · Energy & climate

Plasma control & learned forecasting

Reinforcement learning for magnetic confinement; learned weather models that run in seconds, not hours.

A radiant iridescent plasma vortex of contained energy

Two of the hardest problems in the energy and climate portfolio share a hidden structure. Holding a fusion plasma stable inside a magnetic cage and forecasting the global atmosphere days ahead look nothing alike, yet both reduce to the same shape: a fast, turbulent, high-dimensional physical system whose governing equations we know but cannot afford to integrate quickly enough to act on. In each case the bottleneck is not physics we lack but computation we cannot run in time. And in each case a learned model — trained against simulation, against reanalysis data, or against the device itself — turns an intractable forward problem into something fast enough to control, optimize, or forecast at scale. That is the thread that ties plasma control to weather prediction to the dispatch of a power grid, and it is why a single research program holds them together.

Fusion as a control problem

Thousands of decisions per second
Tokamak field coils must be adjusted thousands of times a second to keep the plasma stable, centered, and in its target shape.
~1000× faster forecasts
A learned global weather model produces a forecast in seconds on a single accelerator, where physics-based systems need supercomputer-hours.
A few percent of a very large number
Each sustainability win — cooling, grid dispatch, industrial process — is a small fraction of an enormous global quantity, and so is itself large.

Reinforcement learning for magnetic confinement

Magnetic-confinement fusion promises abundant clean energy, but only if a plasma hotter than the core of the sun can be held in a precise magnetic cage. In a tokamak that cage is shaped by dozens of field coils whose currents have to be tuned thousands of times per second against a turbulent, nonlinearly coupled medium prone to instabilities that can quench the reaction or damage the vessel in milliseconds. The classical approach hand-engineers a separate feedback loop for each plasma quantity — a laborious craft that must be redone for every new shape and that fights the strong couplings between channels rather than exploiting them.

We treat tokamak control instead as a deep reinforcement-learning problem. An agent is trained inside a fast, differentiable simulator of the plasma's magnetohydrodynamic evolution to map magnetic measurements directly to coil-voltage commands, learning one unified controller that reaches and holds a target configuration rather than a patchwork of tuned loops. Because the simulator is imperfect, sim-to-real transfer is the central challenge: we randomize plasma parameters and model the actuator and sensor characteristics of the real device so the policy is robust to the reality gap. The scientific payoff is twofold.

  • Better confinement and stability — a controller that shapes the plasma more precisely improves the physics directly, holding configurations that are difficult to reach by classical means.
  • The tokamak as an instrument — learned controllers have sustained elongated shapes, negative-triangularity geometries, and even multiple separated plasma "droplets," turning the reactor into an experimental apparatus for studying confinement itself.

The reactor design loop

Control is only one lever. We also train surrogates that predict plasma behavior — the onset of disruptions, the transport of heat and particles, the structure of the edge pedestal — far faster than first-principles simulation, enabling real-time disruption avoidance and the optimization of operating scenarios. At the longest horizon those same fast surrogates can be embedded in the reactor design loop, so that coil placement, field geometry, and operating point are optimized jointly against physics objectives that would be unaffordable to evaluate by direct simulation. The pattern is always the same: a learned model compresses an expensive computation into a fast, differentiable function, and that compression is what makes both optimization and real-time control possible at all.

A tokamak's nested magnetic field coils glowing around a confined plasma core

The surrogate is not approximating the equations; it is approximating the atmosphere — the thing we actually care about. Learning the system rather than the model of the system is the deepest reason a learned forecast can exceed, and not merely accelerate, classical simulation.

Learned weather forecasting

Numerical weather prediction is one of the triumphs of computational science: solve the equations of atmospheric fluid dynamics on a global grid and forecast the weather days ahead. It is also enormously expensive, demanding the largest supercomputers and hours of computation per run. We have shown that a neural network trained on decades of reanalysis data — the historical record of the atmosphere, reconstructed by assimilating billions of observations into physical models — can match or exceed the accuracy of the leading physics-based systems while producing a global forecast in seconds on a single accelerator. The model learns to advance the atmospheric state forward in time, capturing the dynamics implicitly from data rather than integrating the primitive equations step by step.

Why should a learned surrogate beat the simulation it was trained to imitate? Because the explicit solver must resolve fast, small-scale processes it cannot afford to compute accurately, and so parameterizes them crudely; the learned model absorbs the statistical effect of those unresolved processes straight from the data. It distills the dynamics of a physics-based system into a function approximator and, in doing so, models the atmosphere rather than the equations. The thousand-fold speedup is not mere convenience — it changes what is possible: large ensembles that quantify uncertainty, rapid scenario exploration, and forecasting at a cadence and resolution physics-based systems simply cannot afford.

A global atmospheric forecast rendered as swirling pressure systems and cyclone tracks over the ocean
Generative ensembles model the full distribution of future weather rather than a single trajectory, producing physically consistent samples from which the likelihood of cyclones, heat waves, and extreme precipitation can be read directly.

Extremes, ensembles, and downscaling

Accuracy on the typical day matters less than skill on the dangerous one. We focus particular effort on extreme events — cyclone tracks and intensities, heat waves, atmospheric rivers, extreme precipitation — where forecast skill translates directly into lives and property protected. Generative ensemble approaches model the full probability distribution of future weather rather than a single trajectory, producing calibrated, physically consistent samples from which the probability of an extreme can be read off. Extending the horizon, we are pushing learned models toward subseasonal and seasonal prediction, where predictability comes from slowly varying boundary conditions like ocean temperatures and soil moisture, and ultimately toward climate-scale emulation — fast surrogates of climate models that let many more scenarios be explored, coarse projections downscaled to local impact-relevant resolution, and the deep uncertainties of long-range projection characterized far better than today.

Sustainability as optimization

Beyond prediction, learned models optimize the physical systems through which humanity actually uses energy. We apply reinforcement learning and predictive control to cut the energy consumption of large facilities such as data-center cooling, to improve the dispatch and forecasting of renewable generation so that intermittent wind and solar become more usable on the grid, and to tune industrial processes for efficiency and lower emissions. The common structure is a controllable physical system with expensive dynamics, a measurable objective, and ample telemetry — precisely the setting where a learned model of the system enables control that classical methods cannot match. Sustainability, framed this way, is a portfolio of optimization problems, each one a few percent of a very large global quantity; and a few percent of a very large quantity is itself very large.

Open questions

  • Closing the reality gap — how far can sim-to-real domain randomization carry a plasma controller before the policy must learn on the device itself, and what does safe on-device learning look like?
  • Physical consistency at long range — can learned forecasts remain stable and conservation-respecting out to seasonal and climate horizons, or do hybrid physics-plus-learning schemes remain necessary?
  • Honest uncertainty on extremes — how do we calibrate ensembles for the rare, high-impact events that lie at the thin tails of the training distribution, where verification is scarcest?
  • From surrogate to design — how far can differentiable plasma surrogates push the reactor design loop before predictions must be validated against a real device?
  • Aggregating the percents — what coordination is needed for thousands of local optimization wins, across grids and facilities, to compound into globally meaningful emissions reductions?