Worker Disagreement Reveals Sharp Directions in Local SGD

other · 2026-05-28

A recent theoretical study indicates that disagreements among workers in Local SGD can effectively and inexpensively estimate the primary Hessian directions of the loss geometry in deep neural networks. The findings highlight that the covariance of average gaps among workers is influenced by stochastic-gradient noise and the curvature of the Hessian, leading to worker disagreements in sharp, curvature-sensitive directions. This results in a Hessian-free method for estimating the dominant subspace, aiding in the comprehension of anisotropic loss landscapes, where sharp directions coexist with a flatter region. Experiments conducted on MLPs, CNNs, and Transformers demonstrate that the subspaces derived from worker-average gaps encompass a significant portion of the gradient component in the dominant subspace. The paper can be found on arXiv with the reference 2605.27739.

Key facts

arXiv paper 2605.27739
Local SGD worker disagreement estimates dominant Hessian directions
Worker-average gap covariance shaped by noise and curvature
Hessian-free estimator of dominant subspace
Experiments on MLPs, CNNs, and Transformers
Anisotropic loss geometry with sharp and flat directions
Gradients align with sharp directions but progress needs flat directions
Cheap alternative to direct Hessian-based methods

Worker Disagreement Reveals Sharp Directions in Local SGD

Key facts

Entities

Institutions

Sources