ARTFEED — Contemporary Art Intelligence

GradsSharding: Serverless Federated Learning for Large Models

other · 2026-04-27

A recent publication on arXiv (2604.22072) presents GradsSharding, a novel technique for federated learning aggregation tailored for serverless environments, addressing the memory constraints seen in current frameworks like lambda-FL and LIFL. Unlike traditional methods that distribute clients among aggregators, GradsSharding divides the gradient tensor into M shards, with each shard being averaged separately by a serverless function that collects input from all clients. This method limits per-function memory usage to O(|θ|/M), which is unaffected by the number of clients, facilitating the aggregation of models of any size. The results are bit-identical to those achieved with tree-based approaches, ensuring model accuracy. The authors assess GradsSharding through high-performance computing experiments.

Key facts

  • Paper arXiv:2604.22072 proposes GradsSharding
  • GradsSharding partitions gradient tensor into M shards
  • Each shard averaged independently by a serverless function
  • Per-function memory bounded at O(|θ|/M)
  • Enables aggregation of arbitrarily large models
  • Bit-identical results to tree-based approaches
  • Model accuracy invariant by construction
  • Evaluated against lambda-FL and LIFL via HPC experiments

Entities

Institutions

  • arXiv
  • AWS Lambda

Sources