IntentVLA: Short-Horizon Intent Modeling for Aliased Robot Manipulation

ai-technology · 2026-05-16

A new AI framework, IntentVLA, addresses the problem of observation aliasing in robot imitation learning, where similar visual-language inputs can lead to different actions due to varying human intents. The system encodes recent visual history into a compact short-horizon intent representation to condition action chunk generation, reducing inter-chunk conflicts. The researchers also introduce AliasBench, a 12-task benchmark designed to isolate short-horizon observation aliasing. Tests across AliasBench, SimplerEnv, LIBERO, and RoboCasa show improvements in rollout consistency.

Key facts

IntentVLA is a history-conditioned VLA framework
It encodes recent visual observations into a short-horizon intent representation
AliasBench is a 12-task ambiguity-aware benchmark on RoboTwin2
Tests conducted on AliasBench, SimplerEnv, LIBERO, and RoboCasa
The framework improves rollout consistency under partial observability
Human demonstrators act with different short-horizon intents causing multimodal data
Existing frame-conditioned VLA policies may resample different intents across replanning steps
The paper is available on arXiv with ID 2605.14712

IntentVLA: Short-Horizon Intent Modeling for Aliased Robot Manipulation

Key facts

Entities

Institutions

Sources