LLM Gains in ObjectNav Largely Due to Geometry, Not Language
A recent investigation questions the belief that large language models (LLMs) are the primary drivers behind the recent improvements in zero-shot object navigation (ObjectNav). The study reassessed the instruction-guided pipeline, InstructNav, in a detector-controlled environment and presented two variants that do not require training: the geometry-focused Frontier Proximity Explorer (FPE) and the efficient Semantic-Heuristic Frontier (SHF), which employs basic frontier votes from the LLM. In tests on the HM3D and MP3D benchmarks, FPE either equaled or surpassed the detector-controlled instruction follower, achieving faster runtimes without API calls. Meanwhile, SHF demonstrated similar accuracy with a more compact, localized language model. The results suggest that well-designed frontier geometry significantly contributes to the observed advancements, with language serving better as a simple heuristic than as a comprehensive planner. The code can be found at the specified URL.
Key facts
- Study re-evaluates InstructNav under detector-controlled setting.
- Introduces FPE (geometry-only) and SHF (lightweight LLM heuristic).
- FPE matches or exceeds instruction follower without API calls.
- SHF achieves comparable accuracy with smaller language prior.
- Results suggest geometry, not language, drives ObjectNav gains.
- Language is best used as a light heuristic, not end-to-end planner.
- Benchmarks used: HM3D and MP3D.
- Code available at arXiv link.
Entities
Institutions
- arXiv