Brain Alignment of Vision-Language and Action Models During Gameplay

ai-technology · 2026-05-20

A study published on arXiv (2605.19352) investigates how vision-language models (VLMs) and large-action models (LAMs) align with human brain activity during naturalistic gameplay. Using fMRI recordings from participants playing Atari-style video games, researchers examined how action-focused and reasoning-focused prompts shape internal representations. Both VLMs and LAMs showed significant alignment with brain activity, bridging a gap in interactive brain-encoding studies that previously focused on passive tasks or reinforcement-learning agents.

Key facts

Study published on arXiv with ID 2605.19352
Uses fMRI recordings from participants playing Atari-style video games
Compares vision-language models (VLMs) and large-action models (LAMs)
Examines action-focused and reasoning-focused prompts
Both model families exhibit significant brain alignment
Addresses gap in interactive brain-encoding studies
Previous studies limited to reinforcement-learning agents or passive tasks
Research at intersection of neuroscience and machine learning

Brain Alignment of Vision-Language and Action Models During Gameplay

Key facts

Entities

Institutions

Sources