Generative Visual Grounding Enhances EEG Understanding in MLLMs

ai-technology · 2026-05-20

A new framework called Generative Visual Grounding (GVG) uses EEG-to-image generation to improve how multimodal large language models (MLLMs) interpret brain signals. Instead of aligning EEG data solely with text, GVG creates proxy images that provide structured visual context, enabling MLLMs to leverage visual priors for clinical-state interpretation. The approach was validated on two backbones, GVG-X-Omni and GVG-Janus, with the lightweight GVG-X-Omni matching 1.7B-parameter text-aligned baselines while tuning only 170M parameters. The research, published on arXiv (2605.18172), addresses the scarcity of visually-evoked EEG datasets and aims to preserve fine-grained perceptual information often lost in text-only translation.

Key facts

GVG framework uses EEG-to-image generative model as visual translator
Validated on GVG-X-Omni and GVG-Janus backbones
GVG-X-Omni matches 1.7B-parameter text-aligned baselines
Only 170M parameters tuned for GVG-X-Omni
Addresses scarcity of visually-evoked EEG datasets
Preserves fine-grained perceptual information
Published on arXiv with ID 2605.18172
Enables clinical-state interpretation via visual priors

Generative Visual Grounding Enhances EEG Understanding in MLLMs

Key facts

Entities

Institutions

Sources