VLM Reliability Probe Reveals Attention Is Not a Predictor of Correctness
A recent mechanistic investigation challenges the widely held belief that distinct attention maps in vision-language models (VLMs) correlate with accurate responses. Researchers utilized the VLM Reliability Probe (VRP) pipeline across three open-weight VLM families—LLaVA-1.5, PaliGemma, and Qwen2-VL (3-7B parameters). Findings reveal that the attention structure serves as a nearly negligible predictor of accuracy (R_pb(C_k,y)=0.001, 95% CI [-0.034,0.036]), even though attention is essential for feature extraction (masking the top 30% of patches reduces accuracy by 8.2-11.3 pp, p<0.001). Reliability manifests later in the computational process, as shown by hidden-state geometry. This research offers a cohesive framework for assessing VLM reliability.
Key facts
- Attention structure is a near-zero predictor of correctness (R_pb(C_k,y)=0.001).
- Top-30% patch masking drops accuracy by 8.2-11.3 percentage points (p<0.001).
- Three VLM families tested: LLaVA-1.5, PaliGemma, Qwen2-VL (3-7B parameters).
- VLM Reliability Probe (VRP) pipeline compares attention, generation dynamics, and hidden-state geometry.
- Pooled n=3,090 split used for analysis.
- Reliability becomes legible later in computation via hidden-state geometry.
- Attention remains causally necessary for feature extraction despite low predictive power.
- Study challenges the Attention-Confidence Assumption.
Entities
Institutions
- arXiv