TAVIS Benchmark for Active Vision in Imitation Learning
A new benchmark named TAVIS has been launched by researchers to assess active vision within imitation learning. Active vision allows a policy to direct its gaze while performing tasks, demonstrating advantages but lacking a unified evaluation framework. TAVIS features two sets of tasks: TAVIS-Head, which includes five tasks utilizing pan/tilt necks for global search, and TAVIS-Hands, comprising three tasks that employ wrist cameras for local occlusion. These are implemented on two humanoid torso models, GR1T2 and Reachy2, developed in IsaacLab. Additionally, it offers three evaluation components: a headcam versus fixed cam comparison, the GALT (Gaze-Action Lead Time) metric for predictive gaze, and procedural ID/OOD divisions.
Key facts
- TAVIS is a benchmark for active-vision imitation learning.
- It includes TAVIS-Head (5 tasks) and TAVIS-Hands (3 tasks).
- Embodiments used: GR1T2 and Reachy2.
- Built on IsaacLab.
- GALT metric quantifies anticipatory gaze.
- Paired headcam-vs-fixedcam protocol is included.
- Procedural ID/OOD splits are provided.
- Active vision has emerged as key capability in imitation learning.
Entities
—