Muon Optimizer's Geometric Narrative Challenged by New Research
A new arXiv paper challenges the geometric justification for the Muon optimizer, introducing Freon and Kaon as alternatives. Freon uses Schatten (quasi-)norms with a QDWH-based approximation, interpolating between SGD and Muon. Experiments on GPT-2 show optimal performance in the quasi-norm regime, beyond unitarily invariant LMOs. Kaon is presented as an absurd optimizer, further undermining the geometric narrative.
Key facts
- arXiv:2605.11181v1 challenges the geometric narrative of the Muon optimizer.
- Freon optimizer family uses Schatten (quasi-)norms with QDWH-based iterative approximation.
- Freon interpolates between SGD and Muon and extrapolates into quasi-norm regime.
- Best Schatten parameters for GPT-2 lie in quasi-norm regime, not representable by unitarily invariant LMO.
- Kaon optimizer is introduced as an absurd optimizer.
- The paper has three contributions challenging the geometric narrative.
- Muon optimizer's success is not due to precise geometric structure.
- QDWH-based approximation is provably optimal.
Entities
Institutions
- arXiv