New Metric PINK Exposes Over-Correction in Handwritten Math OCR
A study from arXiv (2604.22774) reveals that Vision-Language Models (VLMs) frequently over-correct errors when transcribing multi-line handwritten math, hiding mistakes that educational AI should detect. The authors propose PINK (Penalized INK-based score), a semantic evaluation metric using an LLM for rubric-based grading that penalizes over-correction. The research is the first systematic study of multi-line handwritten math OCR, evaluating 15 state-of-the-art models.
Key facts
- arXiv paper 2604.22774 identifies over-correction in VLMs for handwritten math OCR.
- PINK metric uses LLM-based rubric grading to penalize over-correction.
- First systematic study of multi-line handwritten math OCR.
- 15 state-of-the-art models evaluated.
- Current benchmarks like BLEU fail for multi-line expressions.
- Over-correction hides student errors from educational assessment.
- Prior studies focused on single-line expressions.
- Study aims to improve educational AI systems.
Entities
Institutions
- arXiv