Emotional Circuits in Large Vision-Language Models Decoded via Causal Framework
A new causal attribution framework based on steering vectors has been proposed by researchers to elucidate the emotional processing of Large Vision-Language Models (LVLMs). They developed a dedicated dataset to examine a three-phase mechanism known as 'Adapt-Aggregate-Execute.' The findings revealed a functional decoupling: emotional visual signals are combined in intermediate layers using sentiment-specific attention heads, which are subsequently transformed into narrative output in deeper layers via emotion-general pathways. This research fills a critical gap in comprehending how LVLMs convert visual inputs into emotional storytelling, utilizing visual counterfactuals and causal analysis.
Key facts
- Steering-vector-based causal attribution framework introduced for LVLMs
- Specialized dataset built to analyze emotional circuits
- Three-stage mechanism: Adapt-Aggregate-Execute
- Functional decoupling discovered between middle and deep layers
- Middle layers aggregate visual emotional cues via sentiment-specific attention heads
- Deep layers translate cues into narrative generation through emotion-general pathways
- Addresses scarcity of visual counterfactuals in emotion understanding
- Published on arXiv with ID 2605.21980
Entities
Institutions
- arXiv