ARTFEED — Contemporary Art Intelligence

Event Sparsity-Aware Transformer for Visual Object Tracking

ai-technology · 2026-05-09

Researchers propose a sparsity-aware Mixture-of-Experts Transformer for event-based visual object tracking. Event cameras, which capture asynchronous brightness changes, offer advantages over RGB in low light and fast motion. Existing trackers often ignore event data's spatial sparsity and temporal density, using a fixed temporal-window sampling strategy. The new framework models event-density variations across multiple temporal scales, injecting sparse, medium-density, and dense event regions into a three-stage Vision Transformer backbone for hierarchical multi-density feature learning. A sparsity-aware routing mechanism adaptively selects the most relevant expert for each region. Experiments on FE108, VisEvent, and COESOT datasets show state-of-the-art performance, particularly in challenging conditions. The work addresses a key limitation in event-based tracking by leveraging the unique properties of event data.

Key facts

  • Proposes sparsity-aware Mixture-of-Experts Transformer for event-based tracking
  • Models event-density variations across multiple temporal scales
  • Injects sparse, medium-density, and dense event regions into three-stage Vision Transformer
  • Introduces sparsity-aware routing mechanism for expert selection
  • Achieves state-of-the-art on FE108, VisEvent, and COESOT datasets
  • Addresses limitations of fixed temporal-window sampling in existing trackers
  • Event cameras provide high dynamic range and temporal resolution
  • RGB-based trackers vulnerable to low illumination and fast motion

Entities

Sources