RMH Research
RMH Studios
RMH Studios Technical Reports, Vol. 4, Issue 3, pp. 1-28 — February 20, 2026
DOI: 10.1098/rmh.2026.0087
The dynamics of multi-agent reinforcement learning (MARL) in adversarial game environments exhibit phenomena — spontaneous symmetry breaking, phase transitions between cooperative and defection-dominated equilibria, and critical slowing near bifurcation manifolds — that are strikingly reminiscent of non-equilibrium statistical-mechanical systems. We develop a mean-field theoretic framework that maps the joint policy-gradient dynamics of N interacting agents onto a system of coupled Fokker–Planck equations governing the evolution of policy-parameter probability densities in a high-dimensional strategy space. Under assumptions of weak coupling and Gaussian fluctuations, we derive closed-form expressions for the order parameter, susceptibility, and correlation length of the agent population, and demonstrate that the system undergoes a continuous phase transition at a critical reward-coupling strength whose value depends on the spectral radius of the agent interaction graph. Numerical simulations with 64-agent adversarial capture-the-flag environments corroborate the mean-field predictions: the measured critical exponents (β = 0.51 ± 0.03, γ = 0.98 ± 0.05, ν = 0.49 ± 0.04) are consistent with mean-field universality, and training instabilities previously attributed to non-stationarity are reinterpreted as critical fluctuations near the phase boundary. A renormalization-group-inspired curriculum that gradually increases reward coupling through the critical point reduces training variance by 58% and wall-clock convergence time by 34% relative to standard independent-learner baselines.
Keywords: statistical mechanics, multi-agent reinforcement learning, phase transitions, Fokker-Planck equation, mean-field theory, critical phenomena, adversarial games