IntentVLA: Short-Horizon Intent Modeling for Aliased Robot Manipulation
A new AI framework, IntentVLA, addresses the problem of observation aliasing in robot imitation learning, where similar visual-language inputs can lead to different actions due to varying human intents. The system encodes recent visual history into a compact short-horizon intent representation to condition action chunk generation, reducing inter-chunk conflicts. The researchers also introduce AliasBench, a 12-task benchmark designed to isolate short-horizon observation aliasing. Tests across AliasBench, SimplerEnv, LIBERO, and RoboCasa show improvements in rollout consistency.
Key facts
- IntentVLA is a history-conditioned VLA framework
- It encodes recent visual observations into a short-horizon intent representation
- AliasBench is a 12-task ambiguity-aware benchmark on RoboTwin2
- Tests conducted on AliasBench, SimplerEnv, LIBERO, and RoboCasa
- The framework improves rollout consistency under partial observability
- Human demonstrators act with different short-horizon intents causing multimodal data
- Existing frame-conditioned VLA policies may resample different intents across replanning steps
- The paper is available on arXiv with ID 2605.14712
Entities
Institutions
- arXiv