GradsSharding: Serverless Federated Learning for Large Models

other · 2026-04-27

A recent publication on arXiv (2604.22072) presents GradsSharding, a novel technique for federated learning aggregation tailored for serverless environments, addressing the memory constraints seen in current frameworks like lambda-FL and LIFL. Unlike traditional methods that distribute clients among aggregators, GradsSharding divides the gradient tensor into M shards, with each shard being averaged separately by a serverless function that collects input from all clients. This method limits per-function memory usage to O(|θ|/M), which is unaffected by the number of clients, facilitating the aggregation of models of any size. The results are bit-identical to those achieved with tree-based approaches, ensuring model accuracy. The authors assess GradsSharding through high-performance computing experiments.

Key facts

Paper arXiv:2604.22072 proposes GradsSharding
GradsSharding partitions gradient tensor into M shards
Each shard averaged independently by a serverless function
Per-function memory bounded at O(|θ|/M)
Enables aggregation of arbitrarily large models
Bit-identical results to tree-based approaches
Model accuracy invariant by construction
Evaluated against lambda-FL and LIFL via HPC experiments

GradsSharding: Serverless Federated Learning for Large Models

Key facts

Entities

Institutions

Sources