ARTFEED — Contemporary Art Intelligence

Block-Based Double Decoders: A New Transformer Architecture

ai-technology · 2026-05-20

Researchers propose block-based double decoders, a novel transformer architecture that uses doubly-causal block-based attention masks. This design combines decoder-only training efficiency with encoder-decoder inference efficiency, addressing sparse supervision and dynamic sequence length issues in encoder-decoder models. Scaling law experiments show block-based double decoders outperform encoder-decoders and closely track decoder-only models. At inference, they reduce KV-cache memory and per-token compute by at least two-thirds without sacrificing prefill caching or other optimizations.

Key facts

  • Block-based double decoders use doubly-causal block-based attention masks.
  • The architecture combines decoder-only training efficiency with encoder-decoder inference efficiency.
  • It addresses sparse supervision and dynamic sequence lengths in encoder-decoder models.
  • Scaling law experiments show strong performance over encoder-decoders.
  • Block-based double decoders closely track decoder-only models across scales.
  • Inference-time KV-cache memory and per-token compute are reduced by at least 2/3.
  • Existing inference optimizations for decoder-only models are preserved.
  • The paper is submitted to arXiv under Computer Science > Machine Learning.

Entities

Institutions

  • arXiv

Sources