RELO: Reinforcement Learning Improves Visual Object Tracking
Researchers have introduced RELO, a reinforcement-learning-based method for visual object tracking that replaces traditional handcrafted spatial priors with a learned localization policy. The method formulates target localization as a Markov decision process, using rewards that combine frame-level intersection over union (IoU) and sequence-level area under the success curve (AUC). A layer-aligned temporal token propagation module enhances semantic consistency across frames with negligible computational overhead. On the LaSOText benchmark, RELO achieves 57.5% AUC without template updates, outperforming prior methods. The approach directly optimizes tracking metrics, addressing the misalignment between surrogate supervision and actual evaluation criteria.
Key facts
- RELO replaces handcrafted spatial priors with a learned localization policy via reinforcement learning.
- Target localization is formulated as a Markov decision process.
- Rewards combine frame-level IoU and sequence-level AUC.
- Layer-aligned temporal token propagation improves semantic consistency across frames.
- Achieves 57.5% AUC on LaSOText without template updates.
- Method addresses misalignment between surrogate supervision and tracking metrics.
Entities
—