VLM-Guided Frontier Exploration Boosts Robot Map Coverage by 24%
An innovative autonomous exploration system employs Vision-Language Models (VLMs) to facilitate strategic decision-making in unfamiliar settings. The robot creates a multimodal prompt incorporating its existing map along with visual data of possible frontiers; the VLM then identifies the most advantageous route, utilizing contextual spatial reasoning instead of traditional geometric heuristics. Tested in simulations across six indoor spaces, this method enhances map coverage by as much as 24% compared to current techniques. Additionally, the pipeline is lightweight, does not require training, and can be adapted to any robot equipped with standard sensors and internet connectivity.
Key facts
- VLM performs high-level strategic decision-making for robot exploration.
- Robot generates multimodal prompt with map and visual imagery of frontiers.
- VLM selects most promising frontier using contextual spatial reasoning.
- Validated in simulation across six indoor environments.
- Improves map coverage by up to 24% over existing methods.
- Pipeline is lightweight and training-free.
- Transferable to any robot with standard sensors and internet connection.
- Addresses long-standing challenge of autonomous exploration in hazardous environments.
Entities
Institutions
- arXiv