ARTFEED — Contemporary Art Intelligence

LLM-Based Grading of Handwritten Math Shows High Accuracy

other · 2026-05-20

A recent study investigates the effectiveness of vision-capable large language models (LLMs) in automating the grading of handwritten math assignments. This research, available on arXiv, builds upon an earlier pipeline designed for typed answers by merging transcription and rubric-based assessment of photographic submissions into a single LLM invocation. The evaluation involved student submissions from two STEM courses at a university, where AI grading outcomes were measured against human-assigned benchmarks at the rubric-item level. Findings indicate a high level of accuracy, with 87% of errors in the top-performing model linked to transcription issues rather than incorrect rubric application. The study also identifies frequent error types and underscores the potential of LLMs for scalable assessments in real educational environments.

Key facts

  • arXiv:2605.19043v1
  • Automated grading of handwritten mathematics using vision-capable LLMs
  • Extends prior pipeline for typed responses
  • Integrates transcription and rubric-based evaluation in single LLM call
  • Evaluated on student work from two university STEM courses
  • Compared AI grading against human-assigned ground truth at rubric-item level
  • 87% of errors in best model due to transcription failures
  • Study categorizes common error types

Entities

Institutions

  • arXiv

Sources