Symmetry in Overparameterized Networks Improves Optimization
A recent theoretical study published on arXiv (2604.25150) indicates that overparameterization in neural networks creates weight-space symmetries that facilitate optimization. These symmetries function as diagonal preconditioning on the Hessian, leading to better-conditioned minima within sets of functionally similar solutions. Furthermore, overparameterization raises the likelihood of finding global minima close to standard initializations, making them easier to access. Experiments with teacher-student networks demonstrate that increasing width results in a decrease in Hessian trace, improved condition numbers, and faster convergence. This research offers a comprehensive framework for understanding the advantages of overparameterization in optimizing deep learning models.
Key facts
- Overparameterization introduces additional weight-space symmetries in neural networks.
- Symmetries act as diagonal preconditioning on the Hessian.
- Better-conditioned minima exist within each equivalence class of functionally identical solutions.
- Overparameterization increases the probability mass of global minima near typical initializations.
- Teacher-student network experiments validate theoretical predictions.
- As width increases, Hessian trace decreases and condition numbers improve.
- Convergence accelerates with increased width.
- The analysis provides a unified framework for understanding overparameterization benefits.
Entities
Institutions
- arXiv