ARTFEED — Contemporary Art Intelligence

Visually-Guided Policy Optimization Enhances VLM Reasoning

ai-technology · 2026-05-25

A new framework called Visually-Guided Policy Optimization (VGPO) addresses visual faithfulness deficiencies in vision-language models (VLMs) during reinforcement learning with verifiable rewards (RLVR). The authors identify two key issues: sparse attention activation to visual tokens and temporal visual forgetting across reasoning steps. VGPO introduces a Visual Attention Compensation mechanism that uses visual similarity to amplify visual cues and progressively increases visual expectations in later steps. Additionally, a dual-grained advantage re-weighting strategy is implemented along intra-trajectory steps. The work is published on arXiv with identifier 2604.09349.

Key facts

  • VGPO stands for Visually-Guided Policy Optimization
  • RLVR is reinforcement learning with verifiable rewards
  • VLMs are vision-language models
  • Visual Attention Compensation mechanism uses visual similarity
  • Dual-grained advantage re-weighting is applied intra-trajectory
  • Paper ID: arXiv:2604.09349
  • Announce type: replace-cross
  • Empirical analysis reveals temporal visual forgetting

Entities

Institutions

  • arXiv

Sources