GROW: Aligning GRPO with State-Action Modeling for Open-World VLM Agents

ai-technology · 2026-05-22

GROW, a novel reinforcement learning framework, modifies Group Relative Policy Optimization (GRPO) to enhance multi-turn tasks within open-world vision-language model (VLM) agents. Traditional GRPO necessitates complete trajectories for training, resulting in lengthy contexts and increased noise. In contrast, GROW breaks down these trajectories into state-action samples, calculating advantages among them instead of viewing the entire trajectory as a single unit. This approach is elaborated in a paper available on arXiv under ID 2605.20246, submitted as a cross-type announcement. The research tackles the challenge of utilizing sophisticated RL algorithms for multi-turn visual perception and action execution, essential for open-world applications. The authors present a surrogate analysis demonstrating that while the grouped samples method alters standard GRPO, it remains theoretically sound.

Key facts

GROW is a reinforcement learning framework for open-world VLM agents.
It adapts Group Relative Policy Optimization (GRPO) for multi-turn tasks.
Standard GRPO requires full trajectories as training samples.
GROW decomposes trajectories into state-action samples.
Advantages are computed between state-action samples, not full trajectories.
The paper is on arXiv with ID 2605.20246.
The announcement type is cross.
The work addresses multi-turn visual perception and action execution.

GROW: Aligning GRPO with State-Action Modeling for Open-World VLM Agents

Key facts

Entities

Institutions

Sources