ARTFEED — Contemporary Art Intelligence

MLLMs Fall Short on Grounded Personality Reasoning

ai-technology · 2026-05-23

A new study from arXiv (2605.22109) reveals that Multimodal Large Language Models (MLLMs) struggle to perceive personality through behavioral understanding, instead relying on superficial pattern matching. Researchers introduce Grounded Personality Reasoning (GPR), a task requiring models to anchor Big Five personality ratings in observable evidence via a chain of rating, reasoning, and grounding. They release MM-OCEAN, a dataset of 1,104 videos and 5,320 multiple-choice questions with timestamped behavioral observations and cue-grounding MCQs. A three-tier evaluation (rating, reasoning, grounding) shows that current MLLMs perform poorly on deeper reasoning tasks, indicating a gap between numerical prediction and genuine behavioral understanding.

Key facts

  • Study published on arXiv (2605.22109)
  • Introduces Grounded Personality Reasoning (GPR) task
  • Releases MM-OCEAN dataset with 1,104 videos and 5,320 MCQs
  • Dataset includes timestamped behavioral observations and cue-grounding MCQs
  • Three-tier evaluation: rating, reasoning, grounding
  • MLLMs perform poorly on reasoning and grounding tasks
  • Models rely on superficial pattern matching rather than behavioral understanding
  • Existing benchmarks only evaluate Big Five score prediction

Entities

Institutions

  • arXiv

Sources