Diffusion vs Autoregressive Language Models: Text Differences
A study on arXiv (2605.12522) compares text generated by diffusion language models (DLMs) and autoregressive language models (ARMs). Empirically, DLMs show lower n-gram entropy, higher semantic coherence, and higher semantic diversity. Controlled experiments decouple training objectives from decoding algorithms. The DLM training objective boosts semantic coherence and diversity but barely affects entropy. Bidirectional context is the main driver; input masking, label masking, and weighting have weaker influence. Entropy reduction stems from DLMs' decoding algorithms, particularly confidence-based sampling.
Key facts
- arXiv paper 2605.12522 compares DLM and ARM text
- DLMs have lower n-gram entropy than ARMs
- DLMs exhibit higher semantic coherence and diversity
- Training objective increases coherence and diversity
- Bidirectional context is primary cause of differences
- Input masking, label masking, weighting have minor effects
- Entropy reduction due to decoding algorithms
- Confidence-based sampling contributes to entropy drop
Entities
Institutions
- arXiv