ARTFEED — Contemporary Art Intelligence

TORQ: Training-Free Framework for MXFP4 Quantization in LLMs

ai-technology · 2026-05-20

A new training-free post-training quantization framework called TORQ (Two-level Orthogonal Rotation for MXFP4 Quantization) has been proposed to address accuracy degradation in Large Language Models (LLMs) using the Microscaling FP4 (MXFP4) format. The research, published as arXiv:2605.19561, identifies two structural imbalances in activation distributions: extreme inter-block variance imbalance and intra-block codebook utilization imbalance. TORQ reshapes the geometric properties of the activation space without requiring additional training, aiming to enable practical low-bit inference for LLMs.

Key facts

  • TORQ is a training-free Post-Training Quantization (PTQ) framework
  • It addresses MXFP4 activation quantization accuracy degradation
  • Two root causes identified: inter-block variance imbalance and intra-block codebook utilization imbalance
  • MXFP4 is a cornerstone for next-generation low-bit inference
  • The framework reshapes geometric properties of activation space
  • Published as arXiv:2605.19561
  • No additional training required
  • Targets Large Language Models (LLMs)

Entities

Sources