Tail-Aware HiFloat4 Quantization for Wan2.2 Text-to-Video
A new quantization method called Tail-Aware HiFloat4 has been developed for the Wan2.2 text-to-video generation model. The approach adapts the ViDiT-Q post-training quantization pipeline to use the HiFloat4 numerical format, quantizing main linear layers in Wan2.2 transformer modules with W4A4 fake quantization while keeping boundary modules in high precision. An activation-tail-aware percentile calibration module constructs channel masks to reduce the impact of rare calibration outliers. The method maintains the runtime HiFloat4 arithmetic and sampling pipeline unchanged. This work was submitted to the low-bit text-to-video generation quantization challenge and is described in a report on arXiv.
Key facts
- Method: Tail-Aware HiFloat4
- Submission to low-bit text-to-video generation quantization challenge
- Adapts ViDiT-Q pipeline to Wan2.2
- Uses HiFloat4 numerical format
- Quantizes main linear layers with W4A4 fake quantization
- Keeps boundary modules in high precision
- Introduces activation-tail-aware percentile calibration for channel masks
- Includes compact PTQ-state restoration
Entities
Institutions
- arXiv