VLMs Detect Attention in Educational Videos

other · 2026-05-22

A new study on arXiv (2605.20211) proposes using Vision-Language Models (VLMs) to detect learner attention in educational videos. Traditional methods rely on engineered features from eye-tracking data, which have limited performance. The researchers used an eye-tracking dataset (N=70) and a VLM to analyze video content with superimposed gaze data, aiming to capture complex engagement patterns.

Key facts

Study uses Vision-Language Models (VLMs) for attention detection.
Dataset includes eye-tracking data from 70 participants.
Traditional methods use engineered features from fixations and saccades.
VLM analyzes video content directly with gaze data.
Research aims to improve attention detection in educational videos.
Published on arXiv with ID 2605.20211.
Focus on remote and blended learning environments.
Prior methods struggled with temporal nature of engagement.

VLMs Detect Attention in Educational Videos

Key facts

Entities

Institutions

Sources