Head-Wise Adaptive Sparse Attention Accelerates Video Diffusion

ai-technology · 2026-05-16

A new training-free method, HASTE, accelerates video diffusion models by addressing inefficiencies in sparse attention. Current training-free sparse attention suffers from high mask prediction costs and uniform thresholds across attention heads, limiting speed-quality trade-offs. HASTE introduces two components: Temporal Mask Reuse, which skips unnecessary mask prediction by tracking query-key drift, and Error-guided Budgeted Calibration, which assigns per-head sparsity thresholds to minimize model-output error under a global budget. Tested on Wan2.1-1.3B, the method improves efficiency without retraining.

Key facts

HASTE is a training-free video diffusion acceleration method.
It uses head-wise adaptive sparse attention.
Temporal Mask Reuse skips mask prediction based on query-key drift.
Error-guided Budgeted Calibration assigns per-head top-p thresholds.
It aims to improve the speed-quality trade-off in Video DiTs.
Tested on Wan2.1-1.3B.
Full attention has quadratic complexity.
Existing sparse attention uses shared thresholds despite head-level heterogeneity.

Head-Wise Adaptive Sparse Attention Accelerates Video Diffusion

Key facts

Entities

Institutions

Sources