PInVerify: Benchmark for Active Instance Verification in Embodied AI

other · 2026-06-01

Researchers introduce Active Instance Verification (AIV), a task where embodied agents must select viewpoints around a candidate object to verify if it matches a fine-grained natural-language description, addressing the gap where navigation to a target object does not guarantee correct instance identification due to subtle attribute differences. They formalize AIV as a finite-horizon decision process and present PInVerify, an offline benchmark with 3,000 evaluation episodes across 18 object categories, using multi-view captures and a 6-sector navigation topology that includes trap views and unreachable sectors. Baseline pipelines include a training-free approach and a LoRA-fine-tuned end-to-end agent built on open-source multimodal models.

Key facts

Active Instance Verification (AIV) is a new task for embodied agents.
AIV requires agents to actively select viewpoints to verify fine-grained object attributes.
PInVerify benchmark includes 3,000 evaluation episodes across 18 object categories.
The benchmark uses multi-view captures with a 6-sector navigation topology.
Trap views (navigable but uninformative) and unreachable sectors are included.
AIV is formalized as a finite-horizon decision process.
Baselines include a training-free pipeline and a LoRA-fine-tuned end-to-end agent.
The work is published on arXiv with ID 2605.30639.

PInVerify: Benchmark for Active Instance Verification in Embodied AI

Key facts

Entities

Institutions

Sources