RELO: Reinforcement Learning Improves Visual Object Tracking

ai-technology · 2026-05-11

Researchers have introduced RELO, a reinforcement-learning-based method for visual object tracking that replaces traditional handcrafted spatial priors with a learned localization policy. The method formulates target localization as a Markov decision process, using rewards that combine frame-level intersection over union (IoU) and sequence-level area under the success curve (AUC). A layer-aligned temporal token propagation module enhances semantic consistency across frames with negligible computational overhead. On the LaSOText benchmark, RELO achieves 57.5% AUC without template updates, outperforming prior methods. The approach directly optimizes tracking metrics, addressing the misalignment between surrogate supervision and actual evaluation criteria.

Key facts

RELO replaces handcrafted spatial priors with a learned localization policy via reinforcement learning.
Target localization is formulated as a Markov decision process.
Rewards combine frame-level IoU and sequence-level AUC.
Layer-aligned temporal token propagation improves semantic consistency across frames.
Achieves 57.5% AUC on LaSOText without template updates.
Method addresses misalignment between surrogate supervision and tracking metrics.

Entities

—

Sources

arXiv cs.AI — 2026-05-11