Causal Probing Reveals How MLLMs Encode Visual Concepts

publication · 2026-05-09

A new paper on arXiv proposes a causal framework using activation steering to probe internal visual representations in Multimodal Large Language Models (MLLMs). The study reveals that entities are encoded via localized memorization, while abstract concepts are globally distributed across the network. This divergence explains scaling laws: increasing model depth is crucial for abstract concepts but not for entity localization. Reverse steering shows that blocking explicit output triggers latent activation surges.

Key facts

Paper titled 'Causal Probing for Internal Visual Representations in Multimodal Large Language Models'
Published on arXiv with ID 2605.05593v1
Proposes a causal framework based on activation steering
Systematic intervention across four visual concept categories
Entities exhibit distinct localized memorization
Abstract concepts are globally distributed across the network
Increasing model depth is indispensable for encoding abstract concepts
Entity localization remains invariant to scale
Reverse steering uncovers latent activation surges when explicit output is blocked

Causal Probing Reveals How MLLMs Encode Visual Concepts

Key facts

Entities

Institutions

Sources