Diffusion LLMs Outperform Autoregressive Models in Knowledge Injection Without Paraphrasing

ai-technology · 2026-05-07

A new study from arXiv (2510.09885) compares diffusion large language models (dLLMs) and autoregressive LLMs (arLLMs) in knowledge fine-tuning tasks. Researchers found that dLLMs require fewer training samples and achieve lower loss in pre-training while being more resistant to the reversal curse. In controlled experiments, dLLMs attained high question-answering accuracy without paraphrase augmentation, whereas arLLMs depended on paraphrases to generalize knowledge into QA capability. The study investigates whether the demasking objective alone can induce this advantage in dLLMs, independent of their diffusion denoising paradigm. The findings suggest dLLMs may learn new factual knowledge more efficiently than arLLMs, potentially reducing computational costs for updating LLMs with evolving information.

Key facts

Study compares diffusion LLMs and autoregressive LLMs in knowledge fine-tuning
dLLMs require fewer training samples for lower pre-training loss
dLLMs are more resistant to the reversal curse
arLLMs rely on paraphrase augmentation for QA generalization
dLLMs achieve high QA accuracy without paraphrases
Research investigates demasking objective role in dLLMs' advantage
Published on arXiv with ID 2510.09885
Potential to reduce computational costs for LLM knowledge updates

Diffusion LLMs Outperform Autoregressive Models in Knowledge Injection Without Paraphrasing

Key facts

Entities

Institutions

Sources