Spectral Shaping Improves Muon Optimizer for LLM Training

publication · 2026-05-25

A new arXiv paper introduces DynMuon, a variant of the Muon optimizer that applies spectral shaping to the update matrix. The standard Muon method replaces the gradient update matrix M = UΣV^T with its polar factor UV^T. DynMuon generalizes this by using UΣ^p V^T, where p is a parameter adjusted based on local curvature, stochastic gradient noise, and training stage. The theory and experiments show that positive p values accelerate early training by emphasizing high-curvature directions, while mildly negative p values benefit later stages by shifting focus to low-curvature directions. This previously overlooked behavior offers a dynamic way to improve convergence in large language model training.

Key facts

Muon is the dominant method for training large language models.
Standard Muon replaces the update matrix with its polar factor UV^T.
DynMuon uses UΣ^p V^T for spectral shaping.
Parameter p depends on local curvature, noise, and training stage.
Positive p helps early training by emphasizing high-curvature directions.
Mildly negative p helps later training by focusing on low-curvature directions.
The paper is arXiv:2605.17109.
The work reveals a previously overlooked behavior in Muon-like updates.

Spectral Shaping Improves Muon Optimizer for LLM Training

Key facts

Entities

Institutions

Sources