ARTFEED — Contemporary Art Intelligence

TIDE: First Cross-Architecture Distillation Framework for Diffusion LLMs

ai-technology · 2026-04-30

A team of researchers has unveiled TIDE, the inaugural framework that facilitates cross-architecture knowledge distillation for diffusion large language models (dLLMs). Unlike previous techniques restricted to same-architecture transfers, TIDE permits variations in the architecture, attention mechanism, and tokenizer between the teacher and student models. The framework consists of three modular elements: TIDAL, which modifies distillation intensity based on training advancement and diffusion timestep to reflect the teacher's noise-dependent reliability; CompDemo, which enhances teacher context through complementary mask splitting for improved predictions in heavily masked scenarios; and Reverse CALM, a cross-tokenizer objective that reverses chunk-level likelihood matching for constrained gradients. This research fills a significant gap in dLLM distillation, as leading dLLMs require billions of parameters for optimal performance. The paper can be accessed on arXiv with ID 2604.26951.

Key facts

  • TIDE is the first cross-architecture distillation framework for diffusion large language models.
  • It allows teacher and student to differ in architecture, attention mechanism, and tokenizer.
  • TIDAL modulates distillation strength across training progress and diffusion timestep.
  • CompDemo uses complementary mask splitting to improve predictions under heavy masking.
  • Reverse CALM is a cross-tokenizer objective for bounded gradient matching.
  • Prior distillation methods for dLLMs only work within a single architecture.
  • State-of-the-art dLLMs need billions of parameters for competitive performance.
  • The paper is published on arXiv with ID 2604.26951.

Entities

Institutions

  • arXiv

Sources