LLM Gains in ObjectNav Largely Due to Geometry, Not Language

ai-technology · 2026-05-07

A recent investigation questions the belief that large language models (LLMs) are the primary drivers behind the recent improvements in zero-shot object navigation (ObjectNav). The study reassessed the instruction-guided pipeline, InstructNav, in a detector-controlled environment and presented two variants that do not require training: the geometry-focused Frontier Proximity Explorer (FPE) and the efficient Semantic-Heuristic Frontier (SHF), which employs basic frontier votes from the LLM. In tests on the HM3D and MP3D benchmarks, FPE either equaled or surpassed the detector-controlled instruction follower, achieving faster runtimes without API calls. Meanwhile, SHF demonstrated similar accuracy with a more compact, localized language model. The results suggest that well-designed frontier geometry significantly contributes to the observed advancements, with language serving better as a simple heuristic than as a comprehensive planner. The code can be found at the specified URL.

Key facts

Study re-evaluates InstructNav under detector-controlled setting.
Introduces FPE (geometry-only) and SHF (lightweight LLM heuristic).
FPE matches or exceeds instruction follower without API calls.
SHF achieves comparable accuracy with smaller language prior.
Results suggest geometry, not language, drives ObjectNav gains.
Language is best used as a light heuristic, not end-to-end planner.
Benchmarks used: HM3D and MP3D.
Code available at arXiv link.

LLM Gains in ObjectNav Largely Due to Geometry, Not Language

Key facts

Entities

Institutions

Sources