Q-Learning Error Analysis with Sign Separation
A new study that just came out on arXiv presents an error analysis for Q-learning with a constant step size, focusing on how positive and negative errors behave differently. It uses a switching-system approach, where negative errors are constrained by a linear time-invariant system tied to a fixed optimal policy, while positive errors are managed by a linear switching system. This research reveals that there's a significant imbalance in the error dynamics due to overestimations—specifically, the Bellman maximum is capable of amplifying positive errors, whereas negative ones face restrictions. Additionally, the study provides finite-time limits that are relevant for both deterministic and stochastic situations involving constant step sizes.
Key facts
- The paper is on arXiv with ID 2605.16103.
- It presents a sign-separated finite-time error analysis for Q-learning.
- The analysis uses a switching-system representation.
- Error is decomposed into componentwise negative and positive parts.
- Negative part is dominated by a lower comparison LTI system.
- Positive part is controlled by a linear switching system.
- A max-induced asymmetry in Q-learning error dynamics is identified.
- Finite-time bounds are given for deterministic and stochastic settings.
Entities
Institutions
- arXiv