Jensen Bias in KV-Cache Quantization Hurts Video Diffusion Quality

ai-technology · 2026-05-27

A systematic bias has been discovered in attention weights linked to the chunk-wise quantization of the KV cache in autoregressive video diffusion models. This bias, known as 'Jensen bias,' is a result of the exponential's convex nature in softmax attention, which leads to quantized keys appropriating attention from unquantized current chunks, thereby impairing video quality. To address this issue, the researchers suggest a correction for each attention score that eliminates the bias in expectation, calculated dynamically based on quantization step sizes and query norms, without significant computational cost or extra memory requirements. The findings are presented in arXiv preprint 2605.26266.

Key facts

arXiv:2605.26266 identifies Jensen bias in KV-cache quantization for video diffusion models.
Jensen bias causes quantized keys to steal attention mass from unquantized current chunks.
The bias is due to convexity of exponential in softmax attention.
A per-attention-score correction removes the bias in expectation.
Correction uses quantization step sizes and query norm.
Computational overhead is negligible via second-order Taylor approximation.
No additional memory is required for the correction.
The work targets chunk-wise autoregressive video diffusion models.

Jensen Bias in KV-Cache Quantization Hurts Video Diffusion Quality

Key facts

Entities

Institutions

Sources