AttuneBench: New Benchmark Measures LLM Emotional Intelligence in Conversations
AttuneBench has been launched by researchers as a new benchmark aimed at assessing emotional intelligence in large language models (LLMs) through authentic multi-turn interactions. Unlike previous EI benchmarks that depend on artificial prompts or one-off exchanges, AttuneBench is based on 200 actual dialogues between humans and anonymized LLMs, where participants provided detailed annotations regarding their emotional states, the behavior of the model, and their response preferences. This benchmark evaluates models on several criteria, including emotion recognition, behavioral classification, preference prediction, and the quality of responses. Findings from 11 models reveal that performance rankings across these tasks are largely independent, suggesting that emotional intelligence comprises various distinct skills rather than a singular capability. The research underscores the necessity for more sophisticated assessments of conversational AI.
Key facts
- AttuneBench is a benchmark for LLM emotional intelligence.
- It uses 200 genuine multi-turn human-model conversations.
- Participants provided turn-by-turn annotations of emotional state, model behavior, and preferred responses.
- 11 models were evaluated on emotion recognition, behavioral classification, preference prediction, and response quality.
- Model rankings across tasks were largely independent.
- Existing EI benchmarks rely on synthetic prompts or single-turn cases.
- The research is published on arXiv with ID 2605.21739.
- Emotional intelligence is central to human communication.
Entities
Institutions
- arXiv