ARTFEED — Contemporary Art Intelligence

NVIDIA Nemotron-Labs Diffusion Models Enable Speed-of-Light Text Generation

ai-technology · 2026-05-23

NVIDIA has released Nemotron-Labs Diffusion, a family of diffusion language models (DLMs) that generate multiple tokens in parallel and iteratively refine them, offering up to 6.4× faster token generation than autoregressive models. The models come in 3B, 8B, and 14B scales under the NVIDIA Nemotron Open Model License, plus an 8B vision-language model under the NVIDIA Source Code License. They support three inference modes: autoregressive, diffusion, and self-speculation, the latter combining drafting and verification for lossless acceleration. The 8B model achieves 1.2% higher average accuracy than Qwen3 8B, with diffusion mode reaching 2.6× tokens per forward pass and self-speculation up to 6.4×. Training used 1.3T tokens from NVIDIA Nemotron Pretraining datasets and 45B tokens from post-training datasets, building on the Efficient-DLM approach that converts pretrained AR models into DLMs. Deployment via SGLang allows switching modes with a single config line. Self-speculation on B200 hardware achieves ~865 tok/s on speedbench, roughly 4× the autoregressive baseline. The release includes base and instruction-tuned chat variants, training code via the NVIDIA Megatron Bridge framework, and a technical report.

Key facts

  • Nemotron-Labs Diffusion generates multiple tokens in parallel and iteratively refines them.
  • Models available at 3B, 8B, and 14B scales under NVIDIA Nemotron Open Model License.
  • 8B vision-language model available under NVIDIA Source Code License.
  • Supports autoregressive, diffusion, and self-speculation inference modes.
  • 8B model achieves 1.2% higher average accuracy than Qwen3 8B.
  • Diffusion mode reaches 2.6× tokens per forward pass; self-speculation up to 6.4×.
  • Self-speculation on B200 achieves ~865 tok/s on speedbench, ~4× AR baseline.
  • Trained on 1.3T tokens from NVIDIA Nemotron Pretraining datasets and 45B tokens from post-training datasets.

Entities

Institutions

  • NVIDIA
  • HuggingFace
  • GitHub
  • SGLang

Sources