Verifier-Guided Action Selection Improves Embodied Agent Robustness

ai-technology · 2026-05-14

A team of researchers has introduced VegAS, a framework designed to bolster the resilience of embodied agents utilizing Multimodal Large Language Models (MLLMs) during test time. Rather than opting for a single action, VegAS evaluates a range of potential actions and employs a generative verifier to identify the most dependable choice, all while keeping the original policy intact. The findings indicate that employing a standard MLLM as a verifier does not enhance performance, leading to the development of a data synthesis approach driven by LLMs. This research is available on arXiv with the identifier 2605.12620.

Key facts

VegAS stands for Verifier-Guided Action Selection.
The framework operates at test time only.
It samples an ensemble of candidate actions.
A generative verifier selects the most reliable action.
Off-the-shelf MLLM verifiers showed no improvement.
An LLM-driven data synthesis strategy was developed.
The paper is on arXiv: 2605.12620.
The approach targets out-of-distribution scenarios.

Verifier-Guided Action Selection Improves Embodied Agent Robustness

Key facts

Entities

Institutions

Sources