MLLMs Fall Short on Grounded Personality Reasoning

ai-technology · 2026-05-23

A new study from arXiv (2605.22109) reveals that Multimodal Large Language Models (MLLMs) struggle to perceive personality through behavioral understanding, instead relying on superficial pattern matching. Researchers introduce Grounded Personality Reasoning (GPR), a task requiring models to anchor Big Five personality ratings in observable evidence via a chain of rating, reasoning, and grounding. They release MM-OCEAN, a dataset of 1,104 videos and 5,320 multiple-choice questions with timestamped behavioral observations and cue-grounding MCQs. A three-tier evaluation (rating, reasoning, grounding) shows that current MLLMs perform poorly on deeper reasoning tasks, indicating a gap between numerical prediction and genuine behavioral understanding.

Key facts

Study published on arXiv (2605.22109)
Introduces Grounded Personality Reasoning (GPR) task
Releases MM-OCEAN dataset with 1,104 videos and 5,320 MCQs
Dataset includes timestamped behavioral observations and cue-grounding MCQs
Three-tier evaluation: rating, reasoning, grounding
MLLMs perform poorly on reasoning and grounding tasks
Models rely on superficial pattern matching rather than behavioral understanding
Existing benchmarks only evaluate Big Five score prediction

MLLMs Fall Short on Grounded Personality Reasoning

Key facts

Entities

Institutions

Sources