ReaPER+: Replay-Buffer Engineering for Noise-Robust Quantum Circuit Optimization
A recent study presents ReaPER+, an advanced replay rule designed for deep reinforcement learning aimed at optimizing quantum circuits. This approach shifts from prioritizing temporal-difference (TD) errors in the initial training phase to employing reliability-aware sampling as value estimates become more accurate. It tackles three key challenges: the neglect of TD target reliability in replay buffers, the need for complete quantum-classical evaluations in curriculum-based architecture searches, and the frequent elimination of noiseless trajectories due to hardware noise. ReaPER+ demonstrates sample efficiency improvements of 4-32x compared to fixed PER, ReaPER, and uniform replay across quantum compilation and QAS benchmarks, consistently yielding more compact circuits. Its validation on LunarLander-v3 illustrates its domain-agnostic nature, positioning the replay buffer as a crucial tool for quantum optimization.
Key facts
- ReaPER+ is an annealed replay rule for quantum circuit optimization.
- It transitions from TD error-driven prioritization to reliability-aware sampling.
- Addresses three bottlenecks: replay buffers ignoring TD target reliability, curriculum search requiring full evaluation, and discarding noiseless trajectories.
- Achieves 4-32x gains in sample efficiency over fixed PER, ReaPER, and uniform replay.
- Consistently discovers more compact circuits on quantum compilation and QAS benchmarks.
- Validated on LunarLander-v3, showing domain-agnostic principle.
- Treats replay buffer as primary algorithmic lever for quantum optimization.
- Published on arXiv with ID 2604.21863.
Entities
Institutions
- arXiv