ARTFEED — Contemporary Art Intelligence

HTMuon: Heavy-Tailed Spectral Correction Improves Muon Optimizer

other · 2026-05-25

HTMuon, an optimizer introduced in arXiv:2603.10067, enhances the Muon optimizer by tackling its issues with heavy-tailed weight spectra suppression and excessive focus on noise-dominated directions. Drawing inspiration from Heavy-Tailed Self-Regularization (HT-SR) theory, HTMuon generates updates with heavier tails and fosters heavier-tailed weight spectra, all while maintaining Muon's capacity to capture interdependencies among parameters. In experiments involving LLM pretraining and image classification, HTMuon consistently outperforms leading baselines and can be integrated into current Muon variants. Specifically, during LLaMA pretraining on the C4 dataset, HTMuon achieves a perplexity reduction of up to 0.98 compared to Muon. Theoretically, HTMuon aligns with steepest descent under the Schatten-q norm.

Key facts

  • HTMuon improves Muon optimizer via heavy-tailed spectral correction.
  • Muon's orthogonalized update rule suppresses heavy-tailed weight spectra.
  • HTMuon is motivated by Heavy-Tailed Self-Regularization (HT-SR) theory.
  • HTMuon produces heavier-tailed updates and induces heavier-tailed weight spectra.
  • Experiments on LLM pretraining and image classification show improved performance.
  • HTMuon can serve as a plug-in for existing Muon variants.
  • On LLaMA pretraining on C4 dataset, HTMuon reduces perplexity by up to 0.98.
  • HTMuon corresponds to steepest descent under the Schatten-q norm.

Entities

Institutions

  • arXiv

Sources