ARTFEED — Contemporary Art Intelligence

New Adaptive Optimization Method Bridges SGD and Muon

ai-technology · 2026-05-20

A new paper on arXiv introduces a data-driven criterion for dynamically selecting optimal update geometries in deep neural network optimization. The method unifies existing optimizers like SGD, Muon, Adam, and MuAdam as special cases, using a closed-form criterion derived from gradient and activation statistics via a single-step random feature regression surrogate model. This adaptive approach scales efficiently with computational strategies, potentially improving training dynamics across diverse architectures.

Key facts

  • Paper arXiv:2605.19781 introduces adaptive optimization via Schatten-p norms.
  • Method dynamically chooses proxy-optimal LMO geometries per layer.
  • Criterion derived from gradient and activation statistics using random feature regression.
  • Unifies SGD, Muon, Adam, and MuAdam as specific extrema.
  • Scalable via efficient computational strategies.

Entities

Institutions

  • arXiv

Sources