Instruction-Tuned MLLMs Show Brain Alignment During Movie Watching

ai-technology · 2026-05-22

A study on arXiv (2506.08277) investigates whether instruction-tuned multimodal large language models (IT-MLLMs) align with brain activity during naturalistic movie watching. Researchers predicted fMRI responses from six video and two audio IT-MLLMs across 13 video task instructions, finding that instruction-tuning helps organize representations around functional task demands rather than surface semantics. The work addresses a gap in prior evaluations that focused on unimodal stimuli or non-instruction-tuned models.

Key facts

Study published on arXiv with ID 2506.08277
Investigates instruction-tuned multimodal large language models (IT-MLLMs)
Uses fMRI responses recorded during naturalistic movie watching (video with audio)
Tests six video and two audio IT-MLLMs
Includes 13 video task instructions
Finds instruction-tuning organizes representations around functional task demands
Prior work focused on unimodal stimuli or non-instruction-tuned models
Study addresses brain alignment under multimodal naturalistic stimuli

Instruction-Tuned MLLMs Show Brain Alignment During Movie Watching

Key facts

Entities

Institutions

Sources