StereoNav: Enhancing Vision-and-Language Navigation with Target-Location Priors
Researchers propose StereoNav, a framework to improve Vision-and-Language Navigation (VLN) agents' real-world performance. Current VLN agents degrade in deployment due to perceptual instability and vague instructions. StereoNav introduces Target-Location Priors for stable visual guidance across domains, addressing the simulation-to-reality gap.
Key facts
- VLN is a cornerstone of embodied intelligence.
- Current agents suffer performance degradation from simulation to real-world deployment.
- Degradation is due to perceptual instability (lighting variations, motion blur) and under-specified instructions.
- Existing methods scale up model size and training data.
- The bottleneck is lack of robust spatial grounding and cross-domain priors.
- StereoNav is a robust Vision-Language-Action framework.
- Target-Location Priors provide stable visual guidance invariant across domains.
- The paper is on arXiv with ID 2605.13328.
Entities
Institutions
- arXiv