VLMs Detect Attention in Educational Videos
A new study on arXiv (2605.20211) proposes using Vision-Language Models (VLMs) to detect learner attention in educational videos. Traditional methods rely on engineered features from eye-tracking data, which have limited performance. The researchers used an eye-tracking dataset (N=70) and a VLM to analyze video content with superimposed gaze data, aiming to capture complex engagement patterns.
Key facts
- Study uses Vision-Language Models (VLMs) for attention detection.
- Dataset includes eye-tracking data from 70 participants.
- Traditional methods use engineered features from fixations and saccades.
- VLM analyzes video content directly with gaze data.
- Research aims to improve attention detection in educational videos.
- Published on arXiv with ID 2605.20211.
- Focus on remote and blended learning environments.
- Prior methods struggled with temporal nature of engagement.
Entities
Institutions
- arXiv