ARTFEED — Contemporary Art Intelligence

EduAgentBench: A Multi-Stage Benchmark for AI Tutor Agents

ai-technology · 2026-05-16

EduAgentBench, a novel benchmark, assesses language agents based on actual teaching processes. It comprises 150 meticulously curated tasks that span three areas of capability: professional pedagogical judgment, situated multi-turn tutoring, and completion of Canvas-style teaching workflows. This benchmark aims to gauge a tutor agent's proficiency in diagnosing learner states, adjusting support over time, making pedagogically sound choices, and implementing interventions in authentic learning management systems. The tasks are developed using a pipeline informed by pedagogical insights and are validated through additional verification methods.

Key facts

  • EduAgentBench is a source-grounded benchmark for evaluating tutor agents.
  • It contains 150 quality-controlled tasks.
  • Tasks cover three capability surfaces: professional pedagogical judgment, situated multi-turn tutoring, and Canvas-style teaching workflow completion.
  • The benchmark assesses diagnosis of learner state, adaptation of support, pedagogically justified decisions, and execution of interventions.
  • Tasks are constructed through a pedagogical-insight-driven pipeline.
  • Evaluation uses complementary verification.
  • The benchmark addresses the gap in measuring tutoring capabilities of language agents.
  • Effective tutor agents require more than correct answers or accurate tool calls.

Entities

Institutions

  • arXiv

Sources