Specialized 3B Model Outperforms Frontier AI on OCR Quality, Cost, and Stability

ai-technology · 2026-05-22

In April, Dharma introduced DharmaOCR, a set of tailored small language models designed for structured OCR, along with a benchmark and accompanying paper. A specialized model with 3 billion parameters achieved a composite score of 0.911 on the benchmark, surpassing Claude Opus 4.6 (0.833), Gemini 3.1 Pro (0.820), and GPT-5.4 (0.750). Its operational cost was approximately fifty-two times lower per million pages compared to Claude Opus 4.6, and it recorded the lowest text degeneration rate at 0.20%. The paper posits that alignment with deployment tasks is more critical than the number of parameters. Additionally, a model already fine-tuned for general OCR showed greater gains from domain-specific adjustments than a general-purpose model of similar architecture, challenging the notion that larger models are always superior for enterprise AI tasks.

Key facts

Dharma released DharmaOCR, a pair of specialized small language models for structured OCR, in April.
A 3-billion-parameter specialized model scored 0.911 on the benchmark's composite score.
Claude Opus 4.6 scored 0.833, Gemini 3.1 Pro 0.820, GPT-5.4 0.750, Google Vision 0.686, Google Document AI 0.640, GPT-4o 0.635, Amazon Textract 0.618, Mistral OCR 3 0.574.
The specialized 3B model operated at roughly fifty-two times lower cost per million pages than Claude Opus 4.6.
The specialized 3B model had a text degeneration rate of 0.20%, the lowest evaluated.
The paper claims distributional alignment to the deployment task is more decisive than parameter count.
Specialization compounds: a model already specialized for general OCR benefited more from domain-specific fine-tuning than a general-purpose model.
The findings challenge the assumption that larger frontier models are always the best choice for enterprise AI workloads.

Entities

Institutions

Dharma
Hugging Face
OpenAI
Anthropic
Google
Amazon

Sources

Hugging Face Blog — 2026-05-22