ARTFEED — Contemporary Art Intelligence

VLMs Detect Attention in Educational Videos

other · 2026-05-22

A new study on arXiv (2605.20211) proposes using Vision-Language Models (VLMs) to detect learner attention in educational videos. Traditional methods rely on engineered features from eye-tracking data, which have limited performance. The researchers used an eye-tracking dataset (N=70) and a VLM to analyze video content with superimposed gaze data, aiming to capture complex engagement patterns.

Key facts

  • Study uses Vision-Language Models (VLMs) for attention detection.
  • Dataset includes eye-tracking data from 70 participants.
  • Traditional methods use engineered features from fixations and saccades.
  • VLM analyzes video content directly with gaze data.
  • Research aims to improve attention detection in educational videos.
  • Published on arXiv with ID 2605.20211.
  • Focus on remote and blended learning environments.
  • Prior methods struggled with temporal nature of engagement.

Entities

Institutions

  • arXiv

Sources