EmoTrans Benchmark Tests Emotion Dynamics in Multimodal LLMs

ai-technology · 2026-04-29

Researchers have introduced EmoTrans, a benchmark designed to evaluate how multimodal large language models (MLLMs) understand emotion as a dynamic process rather than static recognition. The benchmark comprises 1,000 manually annotated video clips spanning 12 real-world scenarios, with over 3,000 task-specific question-answer pairs. It includes four tasks: Emotion Change Detection, Emotion Transition Prediction, Emotion State Tracking, and Emotion Context Reasoning. This work addresses a gap in existing benchmarks, which typically treat emotion understanding as a static problem. The study is published on arXiv under the identifier 2604.23348.

Key facts

EmoTrans is a benchmark for evaluating emotion dynamics understanding in multimodal videos.
It contains 1,000 carefully collected and manually annotated video clips.
The benchmark covers 12 real-world scenarios.
It provides over 3,000 task-specific question-answer pairs.
Four tasks are introduced: Emotion Change Detection, Emotion Transition Prediction, Emotion State Tracking, and Emotion Context Reasoning.
Existing benchmarks mainly formulate emotion understanding as a static recognition problem.
The research is published on arXiv with ID 2604.23348.
The work aims to assess MLLMs in applications like social robots and human-computer interaction.

EmoTrans Benchmark Tests Emotion Dynamics in Multimodal LLMs

Key facts

Entities

Institutions

Sources