ARTFEED — Contemporary Art Intelligence

IntentVLA: Short-Horizon Intent Modeling for Aliased Robot Manipulation

ai-technology · 2026-05-16

A new AI framework, IntentVLA, addresses the problem of observation aliasing in robot imitation learning, where similar visual-language inputs can lead to different actions due to varying human intents. The system encodes recent visual history into a compact short-horizon intent representation to condition action chunk generation, reducing inter-chunk conflicts. The researchers also introduce AliasBench, a 12-task benchmark designed to isolate short-horizon observation aliasing. Tests across AliasBench, SimplerEnv, LIBERO, and RoboCasa show improvements in rollout consistency.

Key facts

  • IntentVLA is a history-conditioned VLA framework
  • It encodes recent visual observations into a short-horizon intent representation
  • AliasBench is a 12-task ambiguity-aware benchmark on RoboTwin2
  • Tests conducted on AliasBench, SimplerEnv, LIBERO, and RoboCasa
  • The framework improves rollout consistency under partial observability
  • Human demonstrators act with different short-horizon intents causing multimodal data
  • Existing frame-conditioned VLA policies may resample different intents across replanning steps
  • The paper is available on arXiv with ID 2605.14712

Entities

Institutions

  • arXiv

Sources