Interpretable AI Tutoring System for Presentation Skills
An innovative closed-loop Intelligent Tutoring System (ITS) has been developed to enhance learners' on-camera presentation abilities through multimodal affective feedback. This system implements a seven-dimensional Behaviorally Anchored Rating Scale (BARS) and incorporates a three-tiered feedback framework: multimodal scoring aligned with rubrics, diagnostics based on audience perception, and conversational coaching augmented by retrieval. Utilizing an XGBoost foundation, it analyzes facial, vocal, textual, and oculomotor data to provide evidence-based feedback linked to observable performance indicators. After training on 10,360 MOOC video clips, the ITS demonstrated scoring that aligns closely with expert evaluations (R² = 0.48–0.61, Spearman's ρ = 0.69–0.78, MAE = 0.43–0.57). This system enables focused practice by delivering actionable feedback independently of human instructors. The findings were shared on arXiv (ID: 2605.17468) and are significant for AI in education and human-computer interaction.
Key facts
- The ITS uses multimodal inputs: facial, vocal, textual, and oculomotor features.
- It operationalizes a seven-dimensional Behaviorally Anchored Rating Scale (BARS).
- The feedback architecture has three layers: scoring, diagnostics, and coaching.
- The system is built on an XGBoost backbone.
- It was trained on 10,360 MOOC video segments.
- Performance metrics: R² = 0.48–0.61, Spearman's ρ = 0.69–0.78, MAE = 0.43–0.57.
- The system supports deliberate practice without human instructors.
- The paper is available on arXiv with ID 2605.17468.
Entities
Institutions
- arXiv