TAVIS Benchmark for Active Vision in Imitation Learning

ai-technology · 2026-05-11

A new benchmark named TAVIS has been launched by researchers to assess active vision within imitation learning. Active vision allows a policy to direct its gaze while performing tasks, demonstrating advantages but lacking a unified evaluation framework. TAVIS features two sets of tasks: TAVIS-Head, which includes five tasks utilizing pan/tilt necks for global search, and TAVIS-Hands, comprising three tasks that employ wrist cameras for local occlusion. These are implemented on two humanoid torso models, GR1T2 and Reachy2, developed in IsaacLab. Additionally, it offers three evaluation components: a headcam versus fixed cam comparison, the GALT (Gaze-Action Lead Time) metric for predictive gaze, and procedural ID/OOD divisions.

Key facts

TAVIS is a benchmark for active-vision imitation learning.
It includes TAVIS-Head (5 tasks) and TAVIS-Hands (3 tasks).
Embodiments used: GR1T2 and Reachy2.
Built on IsaacLab.
GALT metric quantifies anticipatory gaze.
Paired headcam-vs-fixedcam protocol is included.
Procedural ID/OOD splits are provided.
Active vision has emerged as key capability in imitation learning.

Entities

—

Sources

arXiv cs.AI — 2026-05-11