New Metric PINK Exposes Over-Correction in Handwritten Math OCR

ai-technology · 2026-04-29

A study from arXiv (2604.22774) reveals that Vision-Language Models (VLMs) frequently over-correct errors when transcribing multi-line handwritten math, hiding mistakes that educational AI should detect. The authors propose PINK (Penalized INK-based score), a semantic evaluation metric using an LLM for rubric-based grading that penalizes over-correction. The research is the first systematic study of multi-line handwritten math OCR, evaluating 15 state-of-the-art models.

Key facts

arXiv paper 2604.22774 identifies over-correction in VLMs for handwritten math OCR.
PINK metric uses LLM-based rubric grading to penalize over-correction.
First systematic study of multi-line handwritten math OCR.
15 state-of-the-art models evaluated.
Current benchmarks like BLEU fail for multi-line expressions.
Over-correction hides student errors from educational assessment.
Prior studies focused on single-line expressions.
Study aims to improve educational AI systems.

New Metric PINK Exposes Over-Correction in Handwritten Math OCR

Key facts

Entities

Institutions

Sources