Verifier-Guided Action Selection Improves Embodied Agent Robustness
A team of researchers has introduced VegAS, a framework designed to bolster the resilience of embodied agents utilizing Multimodal Large Language Models (MLLMs) during test time. Rather than opting for a single action, VegAS evaluates a range of potential actions and employs a generative verifier to identify the most dependable choice, all while keeping the original policy intact. The findings indicate that employing a standard MLLM as a verifier does not enhance performance, leading to the development of a data synthesis approach driven by LLMs. This research is available on arXiv with the identifier 2605.12620.
Key facts
- VegAS stands for Verifier-Guided Action Selection.
- The framework operates at test time only.
- It samples an ensemble of candidate actions.
- A generative verifier selects the most reliable action.
- Off-the-shelf MLLM verifiers showed no improvement.
- An LLM-driven data synthesis strategy was developed.
- The paper is on arXiv: 2605.12620.
- The approach targets out-of-distribution scenarios.
Entities
Institutions
- arXiv