GradsSharding: Serverless Federated Learning for Large Models
A recent publication on arXiv (2604.22072) presents GradsSharding, a novel technique for federated learning aggregation tailored for serverless environments, addressing the memory constraints seen in current frameworks like lambda-FL and LIFL. Unlike traditional methods that distribute clients among aggregators, GradsSharding divides the gradient tensor into M shards, with each shard being averaged separately by a serverless function that collects input from all clients. This method limits per-function memory usage to O(|θ|/M), which is unaffected by the number of clients, facilitating the aggregation of models of any size. The results are bit-identical to those achieved with tree-based approaches, ensuring model accuracy. The authors assess GradsSharding through high-performance computing experiments.
Key facts
- Paper arXiv:2604.22072 proposes GradsSharding
- GradsSharding partitions gradient tensor into M shards
- Each shard averaged independently by a serverless function
- Per-function memory bounded at O(|θ|/M)
- Enables aggregation of arbitrarily large models
- Bit-identical results to tree-based approaches
- Model accuracy invariant by construction
- Evaluated against lambda-FL and LIFL via HPC experiments
Entities
Institutions
- arXiv
- AWS Lambda