NVIDIA Releases Nemotron OCR v2 Multilingual Model Trained on 12 Million Synthetic Images
NVIDIA has introduced Nemotron OCR v2, an advanced multilingual optical character recognition model capable of processing 34.7 pages per second using a single A100 GPU. This model accommodates six languages: English, Japanese, Korean, Russian, Chinese, and an undisclosed language. It was trained on a dataset comprising 12 million synthetic images derived from the mOSCAR multilingual web corpus. Featuring a shared detection backbone (RegNetX-8GF), it enhances Normalized Edit Distance scores for non-English languages from 0.56–0.92 to 0.035–0.069. Character support has increased from 855 to 14,244, now including CJK and Cyrillic scripts. The dataset can be found at nvidia/OCR-Synthetic-Multilingual-v1, while the model is available at nvidia/nemotron-ocr-v2. Key contributors include Bo Liu, Théo Viel, and Mike Ranzinger.
Key facts
- Nemotron OCR v2 processes 34.7 pages/second on a single A100 GPU
- Trained on 12 million synthetic images across six languages
- Uses synthetic data pipeline based on modified SynthDoG from Donut project
- Supports English, Japanese, Korean, Russian, Chinese, and one other language
- Normalized Edit Distance scores improved to 0.035–0.069 on non-English languages
- Dataset available at nvidia/OCR-Synthetic-Multilingual-v1 under CC-BY-4.0
- Model available at nvidia/nemotron-ocr-v2 under NVIDIA Open Model License
- Features shared detection backbone (RegNetX-8GF) for efficiency
Entities
Artists
- Bo Liu
- Théo Viel
- Mike Ranzinger
Institutions
- NVIDIA
- Google Fonts
- Donut project
- mOSCAR
- SynthDoG
- HierText
- OmniDocBench
- PaddleOCR
- OpenOCR
- FOTS
- RegNetX-8GF
- Transformer
- Noto family
- CC-BY-4.0
- NVIDIA Open Model License