CorridorVLA: Sparse Spatial Anchors Improve Robot Action Generation

ai-technology · 2026-04-25

Researchers propose CorridorVLA, a method for Vision-Language-Action (VLA) models that uses sparse spatial anchors to impose explicit tolerance regions during action generation. These anchors define a corridor that guides a flow-matching action head, correcting trajectories that fall outside the corridor while permitting minor deviations. On the LIBERO-Plus benchmark, CorridorVLA improves success rates by 3.4%–12.4% over baselines, with the GR00T-Corr variant achieving 83.21% success rate. The approach addresses the challenge of injecting spatial guidance explicitly rather than implicitly through latent features.

Key facts

CorridorVLA predicts sparse spatial anchors as incremental physical changes (Δ-positions).
Anchors define a tolerance region in the training objective for action generation.
Trajectories outside the corridor receive corrective gradients.
Minor deviations from contacts and execution noise are permitted.
Tested on the LIBERO-Plus benchmark.
Consistent gains across SmolVLA and GR00T models.
Success rate improvement of 3.4%–12.4% over baselines.
GR00T-Corr variant achieves 83.21% success rate.

Entities

—

Sources

arXiv cs.AI — 2026-04-25