Emotional Circuits in Large Vision-Language Models Decoded via Causal Framework

ai-technology · 2026-05-23

A new causal attribution framework based on steering vectors has been proposed by researchers to elucidate the emotional processing of Large Vision-Language Models (LVLMs). They developed a dedicated dataset to examine a three-phase mechanism known as 'Adapt-Aggregate-Execute.' The findings revealed a functional decoupling: emotional visual signals are combined in intermediate layers using sentiment-specific attention heads, which are subsequently transformed into narrative output in deeper layers via emotion-general pathways. This research fills a critical gap in comprehending how LVLMs convert visual inputs into emotional storytelling, utilizing visual counterfactuals and causal analysis.

Key facts

Steering-vector-based causal attribution framework introduced for LVLMs
Specialized dataset built to analyze emotional circuits
Three-stage mechanism: Adapt-Aggregate-Execute
Functional decoupling discovered between middle and deep layers
Middle layers aggregate visual emotional cues via sentiment-specific attention heads
Deep layers translate cues into narrative generation through emotion-general pathways
Addresses scarcity of visual counterfactuals in emotion understanding
Published on arXiv with ID 2605.21980

Emotional Circuits in Large Vision-Language Models Decoded via Causal Framework

Key facts

Entities

Institutions

Sources