ARTFEED — Contemporary Art Intelligence

New Dataset EDU-CIRCUIT-HW Evaluates MLLMs on STEM Handwritten Solutions

ai-technology · 2026-05-01

Researchers have released EDU-CIRCUIT-HW, a dataset of over 1,300 authentic university-level STEM student handwritten solutions, to evaluate Multimodal Large Language Models (MLLMs). The dataset addresses the lack of domain-specific benchmarks for interpreting handwritten content with mathematical formulas, diagrams, and textual reasoning. Current evaluation methods focus on downstream task outcomes like auto-grading, which only probe a subset of recognized content. EDU-CIRCUIT-HW uses expert-verified transcriptions and grading reports to assess MLLMs' understanding of complex handwritten logic. The work aims to improve AI's role in education and reduce teacher workload.

Key facts

  • Dataset contains 1,300+ authentic student handwritten solutions from a university-level STEM course.
  • EDU-CIRCUIT-HW evaluates MLLMs on interpreting handwritten content with formulas, diagrams, and reasoning.
  • Current evaluation paradigms rely on downstream tasks like auto-grading, which fail to capture full understanding.
  • The dataset uses expert-verified verbatim transcriptions and grading reports.
  • The research aims to revolutionize traditional education and reduce teachers' workload.
  • MLLMs hold promise for education but lack authentic benchmarks for handwritten solutions.
  • The dataset bridges the gap in evaluating MLLMs on complex handwritten logic.
  • The work is published on arXiv with ID 2602.00095.

Entities

Institutions

  • arXiv

Sources