CorridorVLA: Sparse Spatial Anchors Improve Robot Action Generation
Researchers propose CorridorVLA, a method for Vision-Language-Action (VLA) models that uses sparse spatial anchors to impose explicit tolerance regions during action generation. These anchors define a corridor that guides a flow-matching action head, correcting trajectories that fall outside the corridor while permitting minor deviations. On the LIBERO-Plus benchmark, CorridorVLA improves success rates by 3.4%–12.4% over baselines, with the GR00T-Corr variant achieving 83.21% success rate. The approach addresses the challenge of injecting spatial guidance explicitly rather than implicitly through latent features.
Key facts
- CorridorVLA predicts sparse spatial anchors as incremental physical changes (Δ-positions).
- Anchors define a tolerance region in the training objective for action generation.
- Trajectories outside the corridor receive corrective gradients.
- Minor deviations from contacts and execution noise are permitted.
- Tested on the LIBERO-Plus benchmark.
- Consistent gains across SmolVLA and GR00T models.
- Success rate improvement of 3.4%–12.4% over baselines.
- GR00T-Corr variant achieves 83.21% success rate.
Entities
—