ARTFEED — Contemporary Art Intelligence

AI Assesses Nursing Competency from Egocentric Video

other · 2026-05-22

A recent study investigates the application of vision-language models to evaluate nursing skills through egocentric video in simulation-based training. Utilizing a three-stage framework, the research extracts action timelines from video with frozen visual encoders and few-shot learning, identifies sequence-level features, and connects them to instructor evaluations of competency. Over 22 sessions (3.8 hours, 493 actions), the frozen DINOv2 backbone combined with HMM Viterbi decoding achieved a 57.4% MOF in leave-one-out 1-shot recognition. Interestingly, a negative correlation was found between recognition accuracy and competency (rho = -0.524, p < 0.05), indicating that higher competency might be linked to more variable or unpredictable actions. This study aims to enhance scalable, objective assessments of competency, minimizing dependence on expert evaluations and inter-rater discrepancies.

Key facts

  • Study uses vision-language models for competency assessment from egocentric video.
  • Three-stage framework: action timeline extraction, feature derivation, relation to instructor ratings.
  • 22 densely annotated sessions (3.8 hours, 493 actions) analyzed.
  • Frozen DINOv2 backbone with HMM Viterbi decoding achieves 57.4% MOF.
  • Negative trend between recognition accuracy and competency (rho = -0.524, p < 0.05).
  • Aims to reduce time-intensive expert observation and inter-rater variability.
  • Published on arXiv with ID 2605.20233.
  • Focus on simulation-based nursing education.

Entities

Institutions

  • arXiv

Sources