New PTQ Framework for W4A4 Quantization of Wan2.2-I2V Video Diffusion Transformers

ai-technology · 2026-05-27

A new framework for post-training quantization aimed at W4A4 quantization of large video diffusion Transformers has been introduced, tackling issues related to activation outliers and timestep-dependent distributions. This approach integrates SVDQuant for low-rank outlier compensation, GPTQ for reconstruction-aware residual weight quantization, and independently assesses timestep-bin-wise per-layer activation clipping ratios for each expert. It focuses on the Mixture-of-Experts DiT architecture of Wan2.2-I2V, where the quantization sensitivities differ between high-noise and low-noise experts. According to results from the OpenS2V-Eval benchmark, this method achieves a 59.3% reduction in peak GPU memory compared to the BF16 baseline, with only a 0.9% decrease in the VBench average score. The research is available on arXiv under ID 2605.27003.

Key facts

Proposed framework combines SVDQuant, GPTQ, and timestep-bin-wise clipping-ratio search.
Addresses activation outliers and timestep-dependent distributions in Wan2.2-I2V.
Targets two-expert Mixture-of-Experts DiT design with distinct quantization sensitivities.
Achieves 59.3% peak GPU memory reduction on OpenS2V-Eval benchmark.
Only 0.9% drop in VBench average score compared to BF16 baseline.
Published on arXiv with ID 2605.27003.
Method is post-training quantization (PTQ).
W4A4 quantization enables substantial memory savings for video diffusion Transformers.

New PTQ Framework for W4A4 Quantization of Wan2.2-I2V Video Diffusion Transformers

Key facts

Entities

Institutions

Sources