VGAS: Value-Guided Action-Chunk Selection for Few-Shot VLA Adaptation
Researchers propose VGAS (Value-Guided Action-chunk Selection), a framework for few-shot adaptation of Vision-Language-Action (VLA) models. VLA models integrate multimodal reasoning with physical control but struggle to adapt to new tasks with limited demonstrations due to geometric ambiguities. VGAS addresses this by using a fine-tuned VLA as a high-recall proposal generator and a Transformer critic called Q-Chunk-Former to select geometrically precise action chunks at inference time via best-of-N selection. The approach aims to improve both semantic faithfulness and geometric precision. The paper is available on arXiv (2602.07399).
Key facts
- VGAS stands for Value-Guided Action-chunk Selection.
- It targets few-shot adaptation of Vision-Language-Action (VLA) models.
- VLA models bridge multimodal reasoning with physical control.
- Adaptation with scarce demonstrations is unreliable due to geometric ambiguities.
- VGAS uses a fine-tuned VLA as a high-recall proposal generator.
- It employs a Transformer critic called Q-Chunk-Former.
- Selection is done via inference-time best-of-N selection.
- The paper is on arXiv with ID 2602.07399.
Entities
Institutions
- arXiv