Newton's Lantern: RL Framework for AC Power Flow Warm Start

other · 2026-05-13

A new reinforcement learning framework, Newton's Lantern, finetunes neural warm start models for AC power flow using group relative policy optimization and a learned reward model. The approach addresses poor generalization of supervised methods on heavily loaded instances near voltage collapse. A theoretical lower bound on Newton-Raphson iterations, dependent on error direction rather than magnitude, explains the failure mode near saddle-node bifurcations. Newton's Lantern uses iteration count as supervisory signal and outperforms baselines on IEEE 118-bus, GOC 500-bus, and GOC 2000-bus benchmarks.

Key facts

Newton's Lantern is a reinforcement learning framework for finetuning AC power flow warm start models.
It combines group relative policy optimization with a learned reward model trained on perturbations of base model predictions.
The method uses iteration count as the supervisory signal.
A theoretical lower bound on Newton-Raphson iterations depends on error direction, not magnitude.
The bound becomes vacuous as the smallest singular value of the power-flow Jacobian shrinks.
Supervised regression fails near saddle-node bifurcation due to this effect.
Newton's Lantern was tested on IEEE 118-bus, GOC 500-bus, and GOC 2000-bus benchmarks.
It is the only method that achieves robust performance across all benchmarks.

Entities

—

Sources

arXiv cs.AI — 2026-05-13