NVIDIA Releases Nemotron OCR v2 Multilingual Model Trained on 12 Million Synthetic Images

ai-technology · 2026-04-19

NVIDIA has introduced Nemotron OCR v2, an advanced multilingual optical character recognition model capable of processing 34.7 pages per second using a single A100 GPU. This model accommodates six languages: English, Japanese, Korean, Russian, Chinese, and an undisclosed language. It was trained on a dataset comprising 12 million synthetic images derived from the mOSCAR multilingual web corpus. Featuring a shared detection backbone (RegNetX-8GF), it enhances Normalized Edit Distance scores for non-English languages from 0.56–0.92 to 0.035–0.069. Character support has increased from 855 to 14,244, now including CJK and Cyrillic scripts. The dataset can be found at nvidia/OCR-Synthetic-Multilingual-v1, while the model is available at nvidia/nemotron-ocr-v2. Key contributors include Bo Liu, Théo Viel, and Mike Ranzinger.

Key facts

Nemotron OCR v2 processes 34.7 pages/second on a single A100 GPU
Trained on 12 million synthetic images across six languages
Uses synthetic data pipeline based on modified SynthDoG from Donut project
Supports English, Japanese, Korean, Russian, Chinese, and one other language
Normalized Edit Distance scores improved to 0.035–0.069 on non-English languages
Dataset available at nvidia/OCR-Synthetic-Multilingual-v1 under CC-BY-4.0
Model available at nvidia/nemotron-ocr-v2 under NVIDIA Open Model License
Features shared detection backbone (RegNetX-8GF) for efficiency

Entities

Artists

Bo Liu
Théo Viel
Mike Ranzinger

Institutions

NVIDIA
Google Fonts
Donut project
mOSCAR
SynthDoG
HierText
OmniDocBench
PaddleOCR
OpenOCR
FOTS
RegNetX-8GF
Transformer
Noto family
CC-BY-4.0
NVIDIA Open Model License

Sources

Hugging Face Blog — 2026-04-17