New Dataset EDU-CIRCUIT-HW Evaluates MLLMs on STEM Handwritten Solutions

ai-technology · 2026-05-01

Researchers have released EDU-CIRCUIT-HW, a dataset of over 1,300 authentic university-level STEM student handwritten solutions, to evaluate Multimodal Large Language Models (MLLMs). The dataset addresses the lack of domain-specific benchmarks for interpreting handwritten content with mathematical formulas, diagrams, and textual reasoning. Current evaluation methods focus on downstream task outcomes like auto-grading, which only probe a subset of recognized content. EDU-CIRCUIT-HW uses expert-verified transcriptions and grading reports to assess MLLMs' understanding of complex handwritten logic. The work aims to improve AI's role in education and reduce teacher workload.

Key facts

Dataset contains 1,300+ authentic student handwritten solutions from a university-level STEM course.
EDU-CIRCUIT-HW evaluates MLLMs on interpreting handwritten content with formulas, diagrams, and reasoning.
Current evaluation paradigms rely on downstream tasks like auto-grading, which fail to capture full understanding.
The dataset uses expert-verified verbatim transcriptions and grading reports.
The research aims to revolutionize traditional education and reduce teachers' workload.
MLLMs hold promise for education but lack authentic benchmarks for handwritten solutions.
The dataset bridges the gap in evaluating MLLMs on complex handwritten logic.
The work is published on arXiv with ID 2602.00095.

New Dataset EDU-CIRCUIT-HW Evaluates MLLMs on STEM Handwritten Solutions

Key facts

Entities

Institutions

Sources