VISAFF: Speaker-Centered Visual Affective Feature Learning for ERC

ai-technology · 2026-05-20

A new framework called VISAFF (Speaker-Centered Visual Affective Feature Learning) targets Emotion Recognition in Conversation (ERC) by focusing on the active speaker rather than background or listeners. It addresses limitations of text-only methods and Vision-Language Models by using a two-stage approach: speaker-centered affective feature extraction and integration with linguistic and prosodic cues. The method aims to reduce computational costs while improving accuracy in complex scenarios like sarcasm. The paper is published on arXiv under ID 2605.18547.

Key facts

VISAFF stands for Speaker-Centered Visual Affective Feature Learning.
It is designed for Emotion Recognition in Conversation (ERC).
The framework focuses on the active speaker, not background or listeners.
It uses a two-stage process: speaker-centered affective feature extraction and integration.
It addresses limitations of text-only methods and Vision-Language Models.
The method aims to reduce computational costs.
It improves accuracy in complex scenarios like sarcasm.
The paper is on arXiv with ID 2605.18547.

VISAFF: Speaker-Centered Visual Affective Feature Learning for ERC

Key facts

Entities

Institutions

Sources