ARTFEED — Contemporary Art Intelligence

Instruction-Tuned MLLMs Show Brain Alignment During Movie Watching

ai-technology · 2026-05-22

A study on arXiv (2506.08277) investigates whether instruction-tuned multimodal large language models (IT-MLLMs) align with brain activity during naturalistic movie watching. Researchers predicted fMRI responses from six video and two audio IT-MLLMs across 13 video task instructions, finding that instruction-tuning helps organize representations around functional task demands rather than surface semantics. The work addresses a gap in prior evaluations that focused on unimodal stimuli or non-instruction-tuned models.

Key facts

  • Study published on arXiv with ID 2506.08277
  • Investigates instruction-tuned multimodal large language models (IT-MLLMs)
  • Uses fMRI responses recorded during naturalistic movie watching (video with audio)
  • Tests six video and two audio IT-MLLMs
  • Includes 13 video task instructions
  • Finds instruction-tuning organizes representations around functional task demands
  • Prior work focused on unimodal stimuli or non-instruction-tuned models
  • Study addresses brain alignment under multimodal naturalistic stimuli

Entities

Institutions

  • arXiv

Sources