ARTFEED — Contemporary Art Intelligence

Persistent Visual Memory Module Enhances LVLM Visual Perception

ai-technology · 2026-05-04

A research paper introduces Persistent Visual Memory (PVM), a lightweight module that addresses the 'Visual Signal Dilution' problem in autoregressive Large Vision-Language Models (LVLMs). In these models, visual attention decays as generated text sequences lengthen. PVM is integrated as a parallel branch alongside the Feed-Forward Network (FFN), creating a distance-agnostic retrieval pathway that supplies visual embeddings directly. This structural intervention mitigates signal suppression during deep generation. Experiments on Qwen3-VL models show consistent accuracy gains with minimal parameter overhead. The paper is available on arXiv under identifier 2605.00814.

Key facts

  • PVM is a lightweight learnable module for LVLMs.
  • It addresses 'Visual Signal Dilution' where visual attention decays with generated sequence length.
  • PVM is integrated as a parallel branch alongside the Feed-Forward Network (FFN).
  • It establishes a distance-agnostic retrieval pathway for direct visual embeddings.
  • Experiments were conducted on Qwen3-VL models.
  • PVM brings notable improvements with negligible parameter overhead.
  • The paper is published on arXiv with ID 2605.00814.
  • The module ensures sustained, on-demand visual perception.

Entities

Institutions

  • arXiv

Sources