TIDE: First Cross-Architecture Distillation Framework for Diffusion LLMs

ai-technology · 2026-04-30

A team of researchers has unveiled TIDE, the inaugural framework that facilitates cross-architecture knowledge distillation for diffusion large language models (dLLMs). Unlike previous techniques restricted to same-architecture transfers, TIDE permits variations in the architecture, attention mechanism, and tokenizer between the teacher and student models. The framework consists of three modular elements: TIDAL, which modifies distillation intensity based on training advancement and diffusion timestep to reflect the teacher's noise-dependent reliability; CompDemo, which enhances teacher context through complementary mask splitting for improved predictions in heavily masked scenarios; and Reverse CALM, a cross-tokenizer objective that reverses chunk-level likelihood matching for constrained gradients. This research fills a significant gap in dLLM distillation, as leading dLLMs require billions of parameters for optimal performance. The paper can be accessed on arXiv with ID 2604.26951.

Key facts

TIDE is the first cross-architecture distillation framework for diffusion large language models.
It allows teacher and student to differ in architecture, attention mechanism, and tokenizer.
TIDAL modulates distillation strength across training progress and diffusion timestep.
CompDemo uses complementary mask splitting to improve predictions under heavy masking.
Reverse CALM is a cross-tokenizer objective for bounded gradient matching.
Prior distillation methods for dLLMs only work within a single architecture.
State-of-the-art dLLMs need billions of parameters for competitive performance.
The paper is published on arXiv with ID 2604.26951.

TIDE: First Cross-Architecture Distillation Framework for Diffusion LLMs

Key facts

Entities

Institutions

Sources