Brain Alignment of Vision-Language and Action Models During Gameplay
A study published on arXiv (2605.19352) investigates how vision-language models (VLMs) and large-action models (LAMs) align with human brain activity during naturalistic gameplay. Using fMRI recordings from participants playing Atari-style video games, researchers examined how action-focused and reasoning-focused prompts shape internal representations. Both VLMs and LAMs showed significant alignment with brain activity, bridging a gap in interactive brain-encoding studies that previously focused on passive tasks or reinforcement-learning agents.
Key facts
- Study published on arXiv with ID 2605.19352
- Uses fMRI recordings from participants playing Atari-style video games
- Compares vision-language models (VLMs) and large-action models (LAMs)
- Examines action-focused and reasoning-focused prompts
- Both model families exhibit significant brain alignment
- Addresses gap in interactive brain-encoding studies
- Previous studies limited to reinforcement-learning agents or passive tasks
- Research at intersection of neuroscience and machine learning
Entities
Institutions
- arXiv