ARTFEED — Contemporary Art Intelligence

NVIDIA Releases Nemotron OCR v2 Multilingual Model Trained on 12 Million Synthetic Images

ai-technology · 2026-04-19

NVIDIA has introduced Nemotron OCR v2, an advanced multilingual optical character recognition model capable of processing 34.7 pages per second using a single A100 GPU. This model accommodates six languages: English, Japanese, Korean, Russian, Chinese, and an undisclosed language. It was trained on a dataset comprising 12 million synthetic images derived from the mOSCAR multilingual web corpus. Featuring a shared detection backbone (RegNetX-8GF), it enhances Normalized Edit Distance scores for non-English languages from 0.56–0.92 to 0.035–0.069. Character support has increased from 855 to 14,244, now including CJK and Cyrillic scripts. The dataset can be found at nvidia/OCR-Synthetic-Multilingual-v1, while the model is available at nvidia/nemotron-ocr-v2. Key contributors include Bo Liu, Théo Viel, and Mike Ranzinger.

Key facts

  • Nemotron OCR v2 processes 34.7 pages/second on a single A100 GPU
  • Trained on 12 million synthetic images across six languages
  • Uses synthetic data pipeline based on modified SynthDoG from Donut project
  • Supports English, Japanese, Korean, Russian, Chinese, and one other language
  • Normalized Edit Distance scores improved to 0.035–0.069 on non-English languages
  • Dataset available at nvidia/OCR-Synthetic-Multilingual-v1 under CC-BY-4.0
  • Model available at nvidia/nemotron-ocr-v2 under NVIDIA Open Model License
  • Features shared detection backbone (RegNetX-8GF) for efficiency

Entities

Artists

  • Bo Liu
  • Théo Viel
  • Mike Ranzinger

Institutions

  • NVIDIA
  • Google Fonts
  • Donut project
  • mOSCAR
  • SynthDoG
  • HierText
  • OmniDocBench
  • PaddleOCR
  • OpenOCR
  • FOTS
  • RegNetX-8GF
  • Transformer
  • Noto family
  • CC-BY-4.0
  • NVIDIA Open Model License

Sources