ARTFEED — Contemporary Art Intelligence

MXFP4 quantization error decomposition for LLM reinforcement learning

ai-technology · 2026-05-22

A recent paper on arXiv (2605.20402) demonstrates that the quantization error of MXFP4 in reinforcement learning for large language models (LLMs) can be broken down into three separate components: scale bias due to power-of-two rounding, deadzone truncation from eliminating small values, and grid noise from rounding to a 4-bit grid. Each of these components leads to specific failure modes: scale bias grows multiplicatively during the backward pass, deadzone truncation reduces the quality of rollouts, and grid noise influences training stability. This analysis indicates that current approaches that view quantization error as a single entity overlook these distinct mechanisms, providing both theoretical and empirical insights into their effects on various RL training pathways.

Key facts

  • arXiv paper 2605.20402
  • MXFP4 arithmetic accelerates RL post-training of LLMs
  • Quantization error decomposed into three additive components
  • Scale bias from power-of-two rounding
  • Deadzone truncation from zeroing small values
  • Grid noise from rounding to nearest 4-bit grid
  • Scale bias affects gradient accuracy via backward pass
  • Deadzone truncation degrades rollout quality

Entities

Institutions

  • arXiv

Sources