Tail-Aware HiFloat4 Quantization for Wan2.2 Text-to-Video

ai-technology · 2026-05-27

A new quantization method called Tail-Aware HiFloat4 has been developed for the Wan2.2 text-to-video generation model. The approach adapts the ViDiT-Q post-training quantization pipeline to use the HiFloat4 numerical format, quantizing main linear layers in Wan2.2 transformer modules with W4A4 fake quantization while keeping boundary modules in high precision. An activation-tail-aware percentile calibration module constructs channel masks to reduce the impact of rare calibration outliers. The method maintains the runtime HiFloat4 arithmetic and sampling pipeline unchanged. This work was submitted to the low-bit text-to-video generation quantization challenge and is described in a report on arXiv.

Key facts

Method: Tail-Aware HiFloat4
Submission to low-bit text-to-video generation quantization challenge
Adapts ViDiT-Q pipeline to Wan2.2
Uses HiFloat4 numerical format
Quantizes main linear layers with W4A4 fake quantization
Keeps boundary modules in high precision
Introduces activation-tail-aware percentile calibration for channel masks
Includes compact PTQ-state restoration

Tail-Aware HiFloat4 Quantization for Wan2.2 Text-to-Video

Key facts

Entities

Institutions

Sources