Worker Disagreement Reveals Sharp Directions in Local SGD
A recent theoretical study indicates that disagreements among workers in Local SGD can effectively and inexpensively estimate the primary Hessian directions of the loss geometry in deep neural networks. The findings highlight that the covariance of average gaps among workers is influenced by stochastic-gradient noise and the curvature of the Hessian, leading to worker disagreements in sharp, curvature-sensitive directions. This results in a Hessian-free method for estimating the dominant subspace, aiding in the comprehension of anisotropic loss landscapes, where sharp directions coexist with a flatter region. Experiments conducted on MLPs, CNNs, and Transformers demonstrate that the subspaces derived from worker-average gaps encompass a significant portion of the gradient component in the dominant subspace. The paper can be found on arXiv with the reference 2605.27739.
Key facts
- arXiv paper 2605.27739
- Local SGD worker disagreement estimates dominant Hessian directions
- Worker-average gap covariance shaped by noise and curvature
- Hessian-free estimator of dominant subspace
- Experiments on MLPs, CNNs, and Transformers
- Anisotropic loss geometry with sharp and flat directions
- Gradients align with sharp directions but progress needs flat directions
- Cheap alternative to direct Hessian-based methods
Entities
Institutions
- arXiv