Low-Rank Bandits with Subspace Drift: Tight Bounds
A recent study published on arXiv (2605.20269) examines low-rank linear contextual bandits characterized by a non-stationary latent subspace that shifts at unknown segment boundaries. The researchers demonstrate precise identification and regret bounds, indicating that the recovery of the subspace necessitates three probing conditions: known noise variance, limited state-noise coupling, and comprehensive probe support. Furthermore, they derive a minimax lower bound of Ω(r√(KT)) and introduce an algorithm that attains Õ(r√(KT)) regret, which aligns with the lower bound, aside from logarithmic factors.
Key facts
- Paper arXiv:2605.20269
- Studies piecewise-stationary low-rank linear contextual bandits
- Rewards live on a low-dimensional latent subspace that drifts
- Three necessary conditions for subspace identification
- Minimax lower bound Ω(r√(KT))
- Algorithm achieves Õ(r√(KT)) regret
- Tight bounds along three axes
- Single-play scalar rewards
Entities
Institutions
- arXiv