ARTFEED — Contemporary Art Intelligence

EmoTrans Benchmark Tests Emotion Dynamics in Multimodal LLMs

ai-technology · 2026-04-29

Researchers have introduced EmoTrans, a benchmark designed to evaluate how multimodal large language models (MLLMs) understand emotion as a dynamic process rather than static recognition. The benchmark comprises 1,000 manually annotated video clips spanning 12 real-world scenarios, with over 3,000 task-specific question-answer pairs. It includes four tasks: Emotion Change Detection, Emotion Transition Prediction, Emotion State Tracking, and Emotion Context Reasoning. This work addresses a gap in existing benchmarks, which typically treat emotion understanding as a static problem. The study is published on arXiv under the identifier 2604.23348.

Key facts

  • EmoTrans is a benchmark for evaluating emotion dynamics understanding in multimodal videos.
  • It contains 1,000 carefully collected and manually annotated video clips.
  • The benchmark covers 12 real-world scenarios.
  • It provides over 3,000 task-specific question-answer pairs.
  • Four tasks are introduced: Emotion Change Detection, Emotion Transition Prediction, Emotion State Tracking, and Emotion Context Reasoning.
  • Existing benchmarks mainly formulate emotion understanding as a static recognition problem.
  • The research is published on arXiv with ID 2604.23348.
  • The work aims to assess MLLMs in applications like social robots and human-computer interaction.

Entities

Institutions

  • arXiv

Sources