MultiEmo-Bench: New Benchmark for Multi-label Visual Emotion Analysis in MLLMs
Researchers introduced MultiEmo-Bench, a multi-label visual emotion analysis benchmark dataset designed to evaluate multimodal large language models (MLLMs) on predicting emotions evoked by images. Recent user studies revealed that humans sometimes prefer MLLM predictions over existing dataset labels, which the authors attribute to suboptimal annotation schemes where annotators judge only one candidate emotion per image. This single-label approach fails to capture that a single image can evoke multiple emotions with varying intensities, potentially underestimating MLLM capabilities. MultiEmo-Bench addresses this gap by providing a multi-label benchmark for comprehensive evaluation. The paper is available on arXiv under ID 2605.14635.
Key facts
- MultiEmo-Bench is a multi-label visual emotion analysis benchmark for MLLMs.
- Recent user studies show humans may prefer MLLM predictions over existing dataset labels.
- Existing datasets use a single-label annotation scheme, limiting evaluation.
- A single image can evoke multiple emotions with varying intensities.
- The benchmark aims to properly evaluate MLLM capabilities in emotion prediction.
- The paper is published on arXiv with ID 2605.14635.
Entities
Institutions
- arXiv