PInVerify: Benchmark for Active Instance Verification in Embodied AI
Researchers introduce Active Instance Verification (AIV), a task where embodied agents must select viewpoints around a candidate object to verify if it matches a fine-grained natural-language description, addressing the gap where navigation to a target object does not guarantee correct instance identification due to subtle attribute differences. They formalize AIV as a finite-horizon decision process and present PInVerify, an offline benchmark with 3,000 evaluation episodes across 18 object categories, using multi-view captures and a 6-sector navigation topology that includes trap views and unreachable sectors. Baseline pipelines include a training-free approach and a LoRA-fine-tuned end-to-end agent built on open-source multimodal models.
Key facts
- Active Instance Verification (AIV) is a new task for embodied agents.
- AIV requires agents to actively select viewpoints to verify fine-grained object attributes.
- PInVerify benchmark includes 3,000 evaluation episodes across 18 object categories.
- The benchmark uses multi-view captures with a 6-sector navigation topology.
- Trap views (navigable but uninformative) and unreachable sectors are included.
- AIV is formalized as a finite-horizon decision process.
- Baselines include a training-free pipeline and a LoRA-fine-tuned end-to-end agent.
- The work is published on arXiv with ID 2605.30639.
Entities
Institutions
- arXiv