VLM Reliability Probe Reveals Attention Is Not a Predictor of Correctness

ai-technology · 2026-05-12

A recent mechanistic investigation challenges the widely held belief that distinct attention maps in vision-language models (VLMs) correlate with accurate responses. Researchers utilized the VLM Reliability Probe (VRP) pipeline across three open-weight VLM families—LLaVA-1.5, PaliGemma, and Qwen2-VL (3-7B parameters). Findings reveal that the attention structure serves as a nearly negligible predictor of accuracy (R_pb(C_k,y)=0.001, 95% CI [-0.034,0.036]), even though attention is essential for feature extraction (masking the top 30% of patches reduces accuracy by 8.2-11.3 pp, p<0.001). Reliability manifests later in the computational process, as shown by hidden-state geometry. This research offers a cohesive framework for assessing VLM reliability.

Key facts

Attention structure is a near-zero predictor of correctness (R_pb(C_k,y)=0.001).
Top-30% patch masking drops accuracy by 8.2-11.3 percentage points (p<0.001).
Three VLM families tested: LLaVA-1.5, PaliGemma, Qwen2-VL (3-7B parameters).
VLM Reliability Probe (VRP) pipeline compares attention, generation dynamics, and hidden-state geometry.
Pooled n=3,090 split used for analysis.
Reliability becomes legible later in computation via hidden-state geometry.
Attention remains causally necessary for feature extraction despite low predictive power.
Study challenges the Attention-Confidence Assumption.

VLM Reliability Probe Reveals Attention Is Not a Predictor of Correctness

Key facts

Entities

Institutions

Sources