Diffusion vs Autoregressive Language Models: Text Differences

other · 2026-05-14

A study on arXiv (2605.12522) compares text generated by diffusion language models (DLMs) and autoregressive language models (ARMs). Empirically, DLMs show lower n-gram entropy, higher semantic coherence, and higher semantic diversity. Controlled experiments decouple training objectives from decoding algorithms. The DLM training objective boosts semantic coherence and diversity but barely affects entropy. Bidirectional context is the main driver; input masking, label masking, and weighting have weaker influence. Entropy reduction stems from DLMs' decoding algorithms, particularly confidence-based sampling.

Key facts

arXiv paper 2605.12522 compares DLM and ARM text
DLMs have lower n-gram entropy than ARMs
DLMs exhibit higher semantic coherence and diversity
Training objective increases coherence and diversity
Bidirectional context is primary cause of differences
Input masking, label masking, weighting have minor effects
Entropy reduction due to decoding algorithms
Confidence-based sampling contributes to entropy drop

Diffusion vs Autoregressive Language Models: Text Differences

Key facts

Entities

Institutions

Sources