ARTFEED — Contemporary Art Intelligence

MLLMs Outperform CNNs in Seizure Video Analysis

ai-technology · 2026-05-07

A recent pilot study published on arXiv investigates the effectiveness of multimodal large language models (MLLMs) in identifying pathological movements in seizure videos without prior training. Analyzing 90 clinical recordings alongside 20 semiological features defined by the ILAE, MLLMs surpassed fine-tuned CNN and ViT benchmarks on 13 out of 18 features. They showed strong performance in recognizing prominent postural and contextual elements, yet faced challenges with subtle, rapid movements. Enhancements targeting specific features, such as facial cropping, pose estimation, and audio denoising, led to improved results on 10 of the 20 features. This research underscores the promise of MLLMs for automated analysis of seizure semiology, despite challenges in detecting intricate motions.

Key facts

  • MLLMs evaluated on 90 clinical seizure recordings
  • 20 ILAE-defined semiological features assessed
  • Zero-shot performance compared to fine-tuned CNN and ViT
  • MLLMs outperformed baselines on 13 of 18 features
  • Signal enhancement improved performance on 10 of 20 features
  • Study published on arXiv with ID 2605.03352

Entities

Institutions

  • arXiv

Sources