ARTFEED — Contemporary Art Intelligence

Cloudless-Training: Efficient Geo-Distributed ML Framework

ai-technology · 2026-04-29

The framework known as Cloudless-Training, introduced in arXiv:2303.05330, aims to enhance the efficiency of machine learning training across multiple geographical regions. It tackles two significant issues: inefficient elastic scheduling of cloud resources spanning different regions and the communication overhead associated with training over wide area networks (WAN), which suffer from low bandwidth and considerable fluctuations. Featuring a dual-layer architecture that incorporates both control and physical training planes, the framework facilitates serverless elastic scheduling and communication. Additionally, it presents a dynamic scheduling strategy that adjusts training workflows according to varying conditions. This initiative is particularly relevant for new machine learning applications, including large model training and federated learning.

Key facts

  • Cloudless-Training is a framework for geo-distributed ML training.
  • It addresses elastic scheduling and WAN communication challenges.
  • Uses a two-layer architecture with control and physical training planes.
  • Supports serverless elastic scheduling and communication.
  • Elastic scheduling strategy adapts to heterogeneity.
  • Targets large model training and federated learning.
  • Published on arXiv with ID 2303.05330.
  • Aims to improve resource utilization and training performance.

Entities

Institutions

  • arXiv

Sources