TORQ: Training-Free Framework for MXFP4 Quantization in LLMs
A new training-free post-training quantization framework called TORQ (Two-level Orthogonal Rotation for MXFP4 Quantization) has been proposed to address accuracy degradation in Large Language Models (LLMs) using the Microscaling FP4 (MXFP4) format. The research, published as arXiv:2605.19561, identifies two structural imbalances in activation distributions: extreme inter-block variance imbalance and intra-block codebook utilization imbalance. TORQ reshapes the geometric properties of the activation space without requiring additional training, aiming to enable practical low-bit inference for LLMs.
Key facts
- TORQ is a training-free Post-Training Quantization (PTQ) framework
- It addresses MXFP4 activation quantization accuracy degradation
- Two root causes identified: inter-block variance imbalance and intra-block codebook utilization imbalance
- MXFP4 is a cornerstone for next-generation low-bit inference
- The framework reshapes geometric properties of activation space
- Published as arXiv:2605.19561
- No additional training required
- Targets Large Language Models (LLMs)
Entities
—