Stabilized Neural HJB Solvers: Error Analysis for Model-Based RL
A new arXiv preprint (2605.07116) presents an error theory for a hybrid regime of physics-informed neural solvers applied to Hamilton-Jacobi-Bellman (HJB) equations in model-based reinforcement learning. This regime uses a neural network to represent the value function, finite-difference HJB policy-evaluation operators evaluated via network queries at shifted points, and residual minimization by random continuous collocation. It preserves stabilized finite-difference policy-evaluation structure without grid-based value unknowns. The authors prove a population L2 stability estimate for one policy-evaluation step with learned dynamics, separating residual error, initial error, and exterior error. The work bridges classical grid methods and continuous-PDE PINNs, offering a theoretical foundation for practical implementations.
Key facts
- arXiv preprint 2605.07116
- Hybrid regime for neural HJB solvers
- Value function represented by neural network
- Finite-difference policy-evaluation operators evaluated at shifted points
- Residuals minimized by random continuous collocation
- Population L2 stability estimate proven
- Error bound separates residual, initial, and exterior errors
- Bridges grid methods and continuous-PDE PINNs
Entities
Institutions
- arXiv