Co-Evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding
A new reinforcement learning framework, Propose-then-Critic, co-evolves a proposer and a visual critic to improve GUI grounding—mapping natural language instructions to precise pixel coordinates. The approach addresses the challenge of visually homogeneous elements and dense layouts by replacing static self-consistency strategies with a learnable selection mechanism that critiques proposals rendered on screenshots. The maturity-aware adaptive co-evolutionary reinforcement learning jointly optimizes both components, overcoming the disparity between grounding and critiquing capabilities. The paper is available on arXiv under reference 2604.21268.
Key facts
- arXiv paper 2604.21268 proposes Propose-then-Critic framework for GUI grounding.
- Framework co-evolves a proposer and a visual critic via reinforcement learning.
- Replaces static self-consistency strategies with a learnable selection mechanism.
- Addresses visually homogeneous elements and dense layouts in GUI grounding.
- Uses maturity-aware adaptive co-evolutionary reinforcement learning.
- Critiques proposals rendered on screenshots to select optimal target.
- Overcomes disparity between grounding and critiquing capabilities.
- Published on arXiv with announcement type cross.
Entities
Institutions
- arXiv