Optimizer-Induced Mode Connectivity in Neural Networks
A recent research paper available on arXiv (2605.09991) examines the impact of optimizers such as AdamW and Muon on mode connectivity within neural networks. The study reveals that for two-layer ReLU networks, solutions generated by a single optimizer create a connected set when the network width is large, which was not established in earlier research. Depending on the type of regularization, regions influenced by different optimizers may either overlap or remain separate. In scenarios with smaller widths, AdamW and Muon lead to disconnected zero-loss components, which are separated by a demonstrable loss barrier. Additionally, during GPT-2 pretraining, paths using the same optimizer maintain the spectrum of each model, while paths using different optimizers show a gradual transition.
Key facts
- arXiv paper 2605.09991
- Studies optimizer-induced mode connectivity
- Focuses on AdamW, Muon, and Lion-𝒦 family
- Two-layer ReLU networks at large width form connected sets
- Different optimizers can yield disjoint or overlapping regions
- Small-width example shows loss barrier between AdamW and Muon
- GPT-2 pretraining experiments conducted
- Same-optimizer paths preserve model spectrum
Entities
Institutions
- arXiv