EduAgentBench: A Multi-Stage Benchmark for AI Tutor Agents

ai-technology · 2026-05-16

EduAgentBench, a novel benchmark, assesses language agents based on actual teaching processes. It comprises 150 meticulously curated tasks that span three areas of capability: professional pedagogical judgment, situated multi-turn tutoring, and completion of Canvas-style teaching workflows. This benchmark aims to gauge a tutor agent's proficiency in diagnosing learner states, adjusting support over time, making pedagogically sound choices, and implementing interventions in authentic learning management systems. The tasks are developed using a pipeline informed by pedagogical insights and are validated through additional verification methods.

Key facts

EduAgentBench is a source-grounded benchmark for evaluating tutor agents.
It contains 150 quality-controlled tasks.
Tasks cover three capability surfaces: professional pedagogical judgment, situated multi-turn tutoring, and Canvas-style teaching workflow completion.
The benchmark assesses diagnosis of learner state, adaptation of support, pedagogically justified decisions, and execution of interventions.
Tasks are constructed through a pedagogical-insight-driven pipeline.
Evaluation uses complementary verification.
The benchmark addresses the gap in measuring tutoring capabilities of language agents.
Effective tutor agents require more than correct answers or accurate tool calls.

EduAgentBench: A Multi-Stage Benchmark for AI Tutor Agents

Key facts

Entities

Institutions

Sources