ARTFEED — Contemporary Art Intelligence

Spectral Shaping Improves Muon Optimizer for LLM Training

publication · 2026-05-25

A new arXiv paper introduces DynMuon, a variant of the Muon optimizer that applies spectral shaping to the update matrix. The standard Muon method replaces the gradient update matrix M = UΣV^T with its polar factor UV^T. DynMuon generalizes this by using UΣ^p V^T, where p is a parameter adjusted based on local curvature, stochastic gradient noise, and training stage. The theory and experiments show that positive p values accelerate early training by emphasizing high-curvature directions, while mildly negative p values benefit later stages by shifting focus to low-curvature directions. This previously overlooked behavior offers a dynamic way to improve convergence in large language model training.

Key facts

  • Muon is the dominant method for training large language models.
  • Standard Muon replaces the update matrix with its polar factor UV^T.
  • DynMuon uses UΣ^p V^T for spectral shaping.
  • Parameter p depends on local curvature, noise, and training stage.
  • Positive p helps early training by emphasizing high-curvature directions.
  • Mildly negative p helps later training by focusing on low-curvature directions.
  • The paper is arXiv:2605.17109.
  • The work reveals a previously overlooked behavior in Muon-like updates.

Entities

Institutions

  • arXiv

Sources