ARTFEED — Contemporary Art Intelligence

P2DNav: Hierarchical Framework for Zero-Shot Vision-and-Language Navigation

ai-technology · 2026-05-20

P2DNav is a newly proposed hierarchical framework designed for zero-shot vision-and-language navigation (VLN). It breaks down the navigation process into two key stages: panoramic direction selection and downview local grounding. The framework includes three components: P2D, SDM, and RRM. P2D is responsible for selecting directions from a 360-degree panorama, while it subsequently predicts pixel-level targets using downview RGB images. This innovative approach aims to minimize errors caused by complex reasoning when navigating in unseen environments. The research has been published on arXiv under the identifier 2605.19634.

Key facts

  • P2DNav is a hierarchical framework for zero-shot VLN
  • It decomposes navigation into panoramic direction selection and downview local grounding
  • Components: P2D, SDM, RRM
  • P2D selects direction from 360-degree panorama
  • Then predicts pixel-level target from downview RGB
  • Aims to reduce errors from entangled reasoning
  • Published on arXiv:2605.19634
  • Addresses zero-shot VLN in unseen environments

Entities

Institutions

  • arXiv

Sources