Specialized 3B Model Outperforms Frontier AI on OCR Quality, Cost, and Stability
In April, Dharma introduced DharmaOCR, a set of tailored small language models designed for structured OCR, along with a benchmark and accompanying paper. A specialized model with 3 billion parameters achieved a composite score of 0.911 on the benchmark, surpassing Claude Opus 4.6 (0.833), Gemini 3.1 Pro (0.820), and GPT-5.4 (0.750). Its operational cost was approximately fifty-two times lower per million pages compared to Claude Opus 4.6, and it recorded the lowest text degeneration rate at 0.20%. The paper posits that alignment with deployment tasks is more critical than the number of parameters. Additionally, a model already fine-tuned for general OCR showed greater gains from domain-specific adjustments than a general-purpose model of similar architecture, challenging the notion that larger models are always superior for enterprise AI tasks.
Key facts
- Dharma released DharmaOCR, a pair of specialized small language models for structured OCR, in April.
- A 3-billion-parameter specialized model scored 0.911 on the benchmark's composite score.
- Claude Opus 4.6 scored 0.833, Gemini 3.1 Pro 0.820, GPT-5.4 0.750, Google Vision 0.686, Google Document AI 0.640, GPT-4o 0.635, Amazon Textract 0.618, Mistral OCR 3 0.574.
- The specialized 3B model operated at roughly fifty-two times lower cost per million pages than Claude Opus 4.6.
- The specialized 3B model had a text degeneration rate of 0.20%, the lowest evaluated.
- The paper claims distributional alignment to the deployment task is more decisive than parameter count.
- Specialization compounds: a model already specialized for general OCR benefited more from domain-specific fine-tuning than a general-purpose model.
- The findings challenge the assumption that larger frontier models are always the best choice for enterprise AI workloads.
Entities
Institutions
- Dharma
- Hugging Face
- OpenAI
- Anthropic
- Amazon