ARTFEED — Contemporary Art Intelligence

Diffusion LLMs Outperform Autoregressive Models in Knowledge Injection Without Paraphrasing

ai-technology · 2026-05-07

A new study from arXiv (2510.09885) compares diffusion large language models (dLLMs) and autoregressive LLMs (arLLMs) in knowledge fine-tuning tasks. Researchers found that dLLMs require fewer training samples and achieve lower loss in pre-training while being more resistant to the reversal curse. In controlled experiments, dLLMs attained high question-answering accuracy without paraphrase augmentation, whereas arLLMs depended on paraphrases to generalize knowledge into QA capability. The study investigates whether the demasking objective alone can induce this advantage in dLLMs, independent of their diffusion denoising paradigm. The findings suggest dLLMs may learn new factual knowledge more efficiently than arLLMs, potentially reducing computational costs for updating LLMs with evolving information.

Key facts

  • Study compares diffusion LLMs and autoregressive LLMs in knowledge fine-tuning
  • dLLMs require fewer training samples for lower pre-training loss
  • dLLMs are more resistant to the reversal curse
  • arLLMs rely on paraphrase augmentation for QA generalization
  • dLLMs achieve high QA accuracy without paraphrases
  • Research investigates demasking objective role in dLLMs' advantage
  • Published on arXiv with ID 2510.09885
  • Potential to reduce computational costs for LLM knowledge updates

Entities

Institutions

  • arXiv

Sources