Head-Wise Adaptive Sparse Attention Accelerates Video Diffusion
A new training-free method, HASTE, accelerates video diffusion models by addressing inefficiencies in sparse attention. Current training-free sparse attention suffers from high mask prediction costs and uniform thresholds across attention heads, limiting speed-quality trade-offs. HASTE introduces two components: Temporal Mask Reuse, which skips unnecessary mask prediction by tracking query-key drift, and Error-guided Budgeted Calibration, which assigns per-head sparsity thresholds to minimize model-output error under a global budget. Tested on Wan2.1-1.3B, the method improves efficiency without retraining.
Key facts
- HASTE is a training-free video diffusion acceleration method.
- It uses head-wise adaptive sparse attention.
- Temporal Mask Reuse skips mask prediction based on query-key drift.
- Error-guided Budgeted Calibration assigns per-head top-p thresholds.
- It aims to improve the speed-quality trade-off in Video DiTs.
- Tested on Wan2.1-1.3B.
- Full attention has quadratic complexity.
- Existing sparse attention uses shared thresholds despite head-level heterogeneity.
Entities
Institutions
- arXiv