EmoTrans Benchmark Tests Emotion Dynamics in Multimodal LLMs
Researchers have introduced EmoTrans, a benchmark designed to evaluate how multimodal large language models (MLLMs) understand emotion as a dynamic process rather than static recognition. The benchmark comprises 1,000 manually annotated video clips spanning 12 real-world scenarios, with over 3,000 task-specific question-answer pairs. It includes four tasks: Emotion Change Detection, Emotion Transition Prediction, Emotion State Tracking, and Emotion Context Reasoning. This work addresses a gap in existing benchmarks, which typically treat emotion understanding as a static problem. The study is published on arXiv under the identifier 2604.23348.
Key facts
- EmoTrans is a benchmark for evaluating emotion dynamics understanding in multimodal videos.
- It contains 1,000 carefully collected and manually annotated video clips.
- The benchmark covers 12 real-world scenarios.
- It provides over 3,000 task-specific question-answer pairs.
- Four tasks are introduced: Emotion Change Detection, Emotion Transition Prediction, Emotion State Tracking, and Emotion Context Reasoning.
- Existing benchmarks mainly formulate emotion understanding as a static recognition problem.
- The research is published on arXiv with ID 2604.23348.
- The work aims to assess MLLMs in applications like social robots and human-computer interaction.
Entities
Institutions
- arXiv