QuantFPFlow: Quantum Speedup for Continuous Reinforcement Learning
QuantFPFlow represents a reinforcement learning framework that incorporates quantum amplitude estimation within the Fokker-Planck approach to stochastic policy optimization. While traditional continuous-space RL agents determine the FP partition function with a cost of O(1/ε²), QuantFPFlow realizes a cost of O(1/ε) through Grover-amplified amplitude estimation, providing a demonstrable quadratic enhancement in speed. The classical simulation inspired by quantum principles already showcases the O(1/ε) algorithmic framework. The estimated stationary distribution generates a theoretically sound exploration bonus, guiding the agent towards the globally optimal areas within multimodal reward landscapes.
Key facts
- QuantFPFlow integrates quantum amplitude estimation into Fokker-Planck policy optimization.
- Classical cost is O(1/ε²); QuantFPFlow achieves O(1/ε).
- Full quantum acceleration requires fault-tolerant hardware.
- Quantum-inspired classical simulation shows O(1/ε) structure.
- Exploration bonus uses estimated stationary distribution.
- Framework steers agents toward global optima in multimodal reward landscapes.
- Published on arXiv with ID 2605.16429.
- Announcement type is cross.
Entities
Institutions
- arXiv