MXFP4 quantization error decomposition for LLM reinforcement learning

ai-technology · 2026-05-22

A recent paper on arXiv (2605.20402) demonstrates that the quantization error of MXFP4 in reinforcement learning for large language models (LLMs) can be broken down into three separate components: scale bias due to power-of-two rounding, deadzone truncation from eliminating small values, and grid noise from rounding to a 4-bit grid. Each of these components leads to specific failure modes: scale bias grows multiplicatively during the backward pass, deadzone truncation reduces the quality of rollouts, and grid noise influences training stability. This analysis indicates that current approaches that view quantization error as a single entity overlook these distinct mechanisms, providing both theoretical and empirical insights into their effects on various RL training pathways.

Key facts

arXiv paper 2605.20402
MXFP4 arithmetic accelerates RL post-training of LLMs
Quantization error decomposed into three additive components
Scale bias from power-of-two rounding
Deadzone truncation from zeroing small values
Grid noise from rounding to nearest 4-bit grid
Scale bias affects gradient accuracy via backward pass
Deadzone truncation degrades rollout quality

MXFP4 quantization error decomposition for LLM reinforcement learning

Key facts

Entities

Institutions

Sources