ARTFEED — Contemporary Art Intelligence

Diffusion-Adaptive Routing Enhances Information Flow in DiTs

publication · 2026-05-22

A recent study published on arXiv (2605.20708) conducts a thorough examination of information flow across layers in Diffusion Transformers (DiTs). It highlights three key issues associated with conventional residual addition: inflation of forward magnitudes, significant decay of backward gradients, and notable redundancy at the block level. To tackle these problems, the authors introduce Diffusion-Adaptive Routing (DAR), a replacement for residual connections that allows for learnable, adaptive aggregation of outputs at each timestep without increasing complexity. DAR seeks to enhance the routing of information through layers and improve denoising at various timesteps, revisiting a crucial design aspect of DiTs that has largely remained unchanged.

Key facts

  • Paper ID: arXiv:2605.20708v1
  • Announce type: cross
  • Focus: cross-layer information flow in Diffusion Transformers (DiTs)
  • Identifies three symptoms: monotonic forward magnitude inflation, sharp backward gradient decay, pronounced block-wise redundancy
  • Proposes Diffusion-Adaptive Routing (DAR) as a drop-in residual replacement
  • DAR performs learnable, timestep-adaptive, and non-incremental aggregation
  • Addresses the residual stream design inherited from the original Transformer
  • Analysis conducted jointly along depth and denoising timestep

Entities

Institutions

  • arXiv

Sources