RMH Studios Development Team
RMH Studios
RMH Studios Technical Reports, Vol. 3, Issue 2, pp. 1-22 — November 2, 2025
DOI: 10.1098/rmh.2025.0287
Procedural content generation presents a fundamental challenge for reinforcement learning (RL) agents: environments that are never encountered twice demand robust generalization rather than trajectory memorization. We investigate the training and transfer performance of Proximal Policy Optimization (PPO) agents across three environment configurations of increasing stochasticity — static dungeon layouts, procedurally generated layouts with fixed seeds, and fully randomized procedural generation — within a custom roguelike testbed modeled after commercial dungeon-crawling games. Over 10 million training timesteps with five independent runs per configuration, agents trained on fully randomized environments exhibited slower initial reward accumulation but superior zero-shot generalization to novel layouts (78% completion on hard unseen levels vs. 12% for static-trained agents). A curriculum-learning protocol that incrementally transitions from static to randomized environments achieved the best overall performance, combining fast early learning with strong transfer (82% on hard novel levels). Behavioral analysis revealed emergent exploration strategies, resource management heuristics, and adaptive combat tactics in procedurally trained agents that were absent in static-trained counterparts. These results provide actionable guidelines for training NPC agents in commercial games with procedurally generated content.
Keywords: reinforcement learning, procedural generation, curriculum learning, roguelike, PPO, generalization