ARTFEED — Contemporary Art Intelligence

Head-Wise Adaptive Sparse Attention Accelerates Video Diffusion

ai-technology · 2026-05-16

A new training-free method, HASTE, accelerates video diffusion models by addressing inefficiencies in sparse attention. Current training-free sparse attention suffers from high mask prediction costs and uniform thresholds across attention heads, limiting speed-quality trade-offs. HASTE introduces two components: Temporal Mask Reuse, which skips unnecessary mask prediction by tracking query-key drift, and Error-guided Budgeted Calibration, which assigns per-head sparsity thresholds to minimize model-output error under a global budget. Tested on Wan2.1-1.3B, the method improves efficiency without retraining.

Key facts

  • HASTE is a training-free video diffusion acceleration method.
  • It uses head-wise adaptive sparse attention.
  • Temporal Mask Reuse skips mask prediction based on query-key drift.
  • Error-guided Budgeted Calibration assigns per-head top-p thresholds.
  • It aims to improve the speed-quality trade-off in Video DiTs.
  • Tested on Wan2.1-1.3B.
  • Full attention has quadratic complexity.
  • Existing sparse attention uses shared thresholds despite head-level heterogeneity.

Entities

Institutions

  • arXiv

Sources