ARTFEED — Contemporary Art Intelligence

nGPT Architecture Enables Stable 4-Bit Training for LLMs

ai-technology · 2026-05-09

A recent paper on arXiv reveals that nGPT, an architecture that normalizes and confines weights and hidden representations to the unit hypersphere, shows greater resilience to low-precision arithmetic. This allows for stable NVFP4 training without the need for methods such as random Hadamard transforms or per-tensor scaling. The findings were confirmed using a 1.2B dense model and hybrid Mamba-Transformer MoE models with parameters ranging from 3B to 30B. The enhanced robustness is linked to the behavior of dot products; while quantization noise is uncorrelated in both standard and normalized architectures, the hypersphere constraint in nGPT fosters weak positive correlations among element-wise products, facilitating constructive signal accumulation and improving the efficiency of 4-bit training.

Key facts

  • arXiv:2605.06067v1
  • nGPT constrains weights and hidden representations to the unit hypersphere
  • Enables stable end-to-end NVFP4 training
  • Validated on 1.2B dense model and hybrid MoE models up to 3B/30B parameters
  • Removes need for random Hadamard transforms and per-tensor scaling
  • Robustness traced to dot product behavior under quantization noise
  • Hypersphere constraint enhances weak positive correlations among element-wise products
  • Constructive signal accumulation across hidden dimension

Entities

Institutions

  • arXiv

Sources