Efficient-DLM: Converting Autoregressive Models to Fast Diffusion Language Models

publication · 2026-05-01

A new study on arXiv (2512.14067) introduces Efficient-DLM, a method to convert pretrained autoregressive (AR) language models into efficient diffusion language models (dLMs) that generate text in parallel while preserving task accuracy. The researchers identified limitations in existing AR-to-dLM conversion methods, particularly in attention patterns and objectives. They propose a continuous pretraining scheme with a block-wise attention pattern that maintains causal relationships across blocks but enables bidirectional attention within blocks, preserving pretrained AR weight distributions. This approach aims to bridge the learning efficiency gap between dLMs and AR models when trained from scratch.

Key facts

arXiv paper 2512.14067 titled 'Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed'
Study focuses on converting pretrained AR models into efficient dLMs
Conversion aims to enable parallel non-autoregressive generation while preserving AR model accuracy
Researchers identified limitations in attention patterns and objectives of existing AR-to-dLM methods
Proposed continuous pretraining scheme with block-wise attention pattern
Block-wise attention remains causal across blocks but bidirectional within blocks
Maintaining pretrained AR weight distributions is critical for effective conversion
Method addresses learning efficiency gap between dLMs and AR models trained from scratch

Efficient-DLM: Converting Autoregressive Models to Fast Diffusion Language Models

Key facts

Entities

Institutions

Sources