ARTFEED — Contemporary Art Intelligence

Emotional Circuits in Large Vision-Language Models Decoded via Causal Framework

ai-technology · 2026-05-23

A new causal attribution framework based on steering vectors has been proposed by researchers to elucidate the emotional processing of Large Vision-Language Models (LVLMs). They developed a dedicated dataset to examine a three-phase mechanism known as 'Adapt-Aggregate-Execute.' The findings revealed a functional decoupling: emotional visual signals are combined in intermediate layers using sentiment-specific attention heads, which are subsequently transformed into narrative output in deeper layers via emotion-general pathways. This research fills a critical gap in comprehending how LVLMs convert visual inputs into emotional storytelling, utilizing visual counterfactuals and causal analysis.

Key facts

  • Steering-vector-based causal attribution framework introduced for LVLMs
  • Specialized dataset built to analyze emotional circuits
  • Three-stage mechanism: Adapt-Aggregate-Execute
  • Functional decoupling discovered between middle and deep layers
  • Middle layers aggregate visual emotional cues via sentiment-specific attention heads
  • Deep layers translate cues into narrative generation through emotion-general pathways
  • Addresses scarcity of visual counterfactuals in emotion understanding
  • Published on arXiv with ID 2605.21980

Entities

Institutions

  • arXiv

Sources