Optimizer-Induced Mode Connectivity in Neural Networks

other · 2026-05-12

A recent research paper available on arXiv (2605.09991) examines the impact of optimizers such as AdamW and Muon on mode connectivity within neural networks. The study reveals that for two-layer ReLU networks, solutions generated by a single optimizer create a connected set when the network width is large, which was not established in earlier research. Depending on the type of regularization, regions influenced by different optimizers may either overlap or remain separate. In scenarios with smaller widths, AdamW and Muon lead to disconnected zero-loss components, which are separated by a demonstrable loss barrier. Additionally, during GPT-2 pretraining, paths using the same optimizer maintain the spectrum of each model, while paths using different optimizers show a gradual transition.

Key facts

arXiv paper 2605.09991
Studies optimizer-induced mode connectivity
Focuses on AdamW, Muon, and Lion-𝒦 family
Two-layer ReLU networks at large width form connected sets
Different optimizers can yield disjoint or overlapping regions
Small-width example shows loss barrier between AdamW and Muon
GPT-2 pretraining experiments conducted
Same-optimizer paths preserve model spectrum

Optimizer-Induced Mode Connectivity in Neural Networks

Key facts

Entities

Institutions

Sources