Co-Evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding

other · 2026-04-25

A new reinforcement learning framework, Propose-then-Critic, co-evolves a proposer and a visual critic to improve GUI grounding—mapping natural language instructions to precise pixel coordinates. The approach addresses the challenge of visually homogeneous elements and dense layouts by replacing static self-consistency strategies with a learnable selection mechanism that critiques proposals rendered on screenshots. The maturity-aware adaptive co-evolutionary reinforcement learning jointly optimizes both components, overcoming the disparity between grounding and critiquing capabilities. The paper is available on arXiv under reference 2604.21268.

Key facts

arXiv paper 2604.21268 proposes Propose-then-Critic framework for GUI grounding.
Framework co-evolves a proposer and a visual critic via reinforcement learning.
Replaces static self-consistency strategies with a learnable selection mechanism.
Addresses visually homogeneous elements and dense layouts in GUI grounding.
Uses maturity-aware adaptive co-evolutionary reinforcement learning.
Critiques proposals rendered on screenshots to select optimal target.
Overcomes disparity between grounding and critiquing capabilities.
Published on arXiv with announcement type cross.

Co-Evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding

Key facts

Entities

Institutions

Sources