ARTFEED — Contemporary Art Intelligence

Muon Optimizer's Geometric Narrative Challenged by New Research

publication · 2026-05-13

A new arXiv paper challenges the geometric justification for the Muon optimizer, introducing Freon and Kaon as alternatives. Freon uses Schatten (quasi-)norms with a QDWH-based approximation, interpolating between SGD and Muon. Experiments on GPT-2 show optimal performance in the quasi-norm regime, beyond unitarily invariant LMOs. Kaon is presented as an absurd optimizer, further undermining the geometric narrative.

Key facts

  • arXiv:2605.11181v1 challenges the geometric narrative of the Muon optimizer.
  • Freon optimizer family uses Schatten (quasi-)norms with QDWH-based iterative approximation.
  • Freon interpolates between SGD and Muon and extrapolates into quasi-norm regime.
  • Best Schatten parameters for GPT-2 lie in quasi-norm regime, not representable by unitarily invariant LMO.
  • Kaon optimizer is introduced as an absurd optimizer.
  • The paper has three contributions challenging the geometric narrative.
  • Muon optimizer's success is not due to precise geometric structure.
  • QDWH-based approximation is provably optimal.

Entities

Institutions

  • arXiv

Sources