Newton's Lantern: RL Framework for AC Power Flow Warm Start
A new reinforcement learning framework, Newton's Lantern, finetunes neural warm start models for AC power flow using group relative policy optimization and a learned reward model. The approach addresses poor generalization of supervised methods on heavily loaded instances near voltage collapse. A theoretical lower bound on Newton-Raphson iterations, dependent on error direction rather than magnitude, explains the failure mode near saddle-node bifurcations. Newton's Lantern uses iteration count as supervisory signal and outperforms baselines on IEEE 118-bus, GOC 500-bus, and GOC 2000-bus benchmarks.
Key facts
- Newton's Lantern is a reinforcement learning framework for finetuning AC power flow warm start models.
- It combines group relative policy optimization with a learned reward model trained on perturbations of base model predictions.
- The method uses iteration count as the supervisory signal.
- A theoretical lower bound on Newton-Raphson iterations depends on error direction, not magnitude.
- The bound becomes vacuous as the smallest singular value of the power-flow Jacobian shrinks.
- Supervised regression fails near saddle-node bifurcation due to this effect.
- Newton's Lantern was tested on IEEE 118-bus, GOC 500-bus, and GOC 2000-bus benchmarks.
- It is the only method that achieves robust performance across all benchmarks.
Entities
—