Flat Minima in Neural Networks Are an Illusion, Study Shows
A recent study published on arXiv disputes the common assumption that flat minima in the loss landscapes of neural networks enhance generalization. The author reveals that reparameterization preserving function can significantly increase the Hessian of any minimum without changing predictions, suggesting that flatness is not a causal element. Instead, the research introduces the concept of "weakness," defined as the volume of completions that align with the learned function in the learner's embodied language, as the actual factor influencing generalization. Weakness remains invariant under reparameterization and is shown to be minimax-optimal for exchangeable demands. Additionally, the findings indicate that PAC-Bayes bounds are effective due to their correlation with weakness, with experiments on MNIST demonstrating the advantages of large-batch generalization.
Key facts
- Flat minima are not the cause of better generalization in neural networks.
- Function-preserving reparameterization can inflate the Hessian by two orders of magnitude.
- Weakness is defined as the volume of completions compatible with the learned function.
- Weakness is reparameterization-invariant and minimax-optimal under exchangeable demands.
- PAC-Bayes bounds work because they correlate with weakness.
- The paper is published on arXiv with ID 2605.05209.
- Experiments were conducted on the MNIST dataset.
- The study challenges Sharpness-Aware Minimisation (SAM) theory.
Entities
Institutions
- arXiv