VGAS: Value-Guided Action-Chunk Selection for Few-Shot VLA Adaptation

ai-technology · 2026-05-25

Researchers propose VGAS (Value-Guided Action-chunk Selection), a framework for few-shot adaptation of Vision-Language-Action (VLA) models. VLA models integrate multimodal reasoning with physical control but struggle to adapt to new tasks with limited demonstrations due to geometric ambiguities. VGAS addresses this by using a fine-tuned VLA as a high-recall proposal generator and a Transformer critic called Q-Chunk-Former to select geometrically precise action chunks at inference time via best-of-N selection. The approach aims to improve both semantic faithfulness and geometric precision. The paper is available on arXiv (2602.07399).

Key facts

VGAS stands for Value-Guided Action-chunk Selection.
It targets few-shot adaptation of Vision-Language-Action (VLA) models.
VLA models bridge multimodal reasoning with physical control.
Adaptation with scarce demonstrations is unreliable due to geometric ambiguities.
VGAS uses a fine-tuned VLA as a high-recall proposal generator.
It employs a Transformer critic called Q-Chunk-Former.
Selection is done via inference-time best-of-N selection.
The paper is on arXiv with ID 2602.07399.

VGAS: Value-Guided Action-Chunk Selection for Few-Shot VLA Adaptation

Key facts

Entities

Institutions

Sources