LRA-EE: Early Exit Bypasses Quantization Collapse in CLIP

ai-technology · 2026-05-27

A recent study published on arXiv (2605.26415) uncovers a failure mode in quantized CLIP models, termed Quantization-Induced Representation Collapse (QIRC). In the INT8 CLIP ViT-B/32 model, activation noise builds up through transformer layers, diminishing cosine alignment during zero-shot retrieval. The ratio of noise to signal increases from under 10% in the initial layers to 52% by Layer 11. To address this issue, the authors suggest LRA-EE (Layer-wise Representation-Aware Early Exit), which utilizes Spatio-Semantic Aggregation, a learned multi-feature gate, and Layer-adaptive Confidence Threshold to circumvent noise-dominated deeper layers.

Key facts

arXiv:2605.26415v1
INT8 quantization introduces a failure mode in CLIP distinct from quantized CNN classifiers
Activation noise perturbs multimodal embedding direction
Quantization-Induced Representation Collapse (QIRC) is characterized
Noise-to-signal ratio grows from below 10% to 52% at Layer 11 in INT8 CLIP ViT-B/32
LRA-EE (Layer-wise Representation-Aware Early Exit) is proposed
Spatio-Semantic Aggregation replaces immature shallow [CLS] with global patch-token average
Learned multi-feature gate uses confidence, top-2 margin, spatial-activation variance

LRA-EE: Early Exit Bypasses Quantization Collapse in CLIP

Key facts

Entities

Institutions

Sources