Piper Framework Optimizes Large-Scale MoE Training on HPC Platforms
A novel framework known as Piper tackles the performance limitations encountered when training Mixture-of-Experts (MoE) models on high-performance computing (HPC) systems. While MoE architectures are gaining traction in cutting-edge AI models due to their efficiency, they face obstacles such as large memory demands, extensive communication needs across diverse networks, and significant workload disparities. Piper employs a mathematical model to assess memory, computation, and communication needs across different parallelization methods, validated through micro-benchmarking, code instrumentation, and hardware profiling. It pinpoints critical bottlenecks, including all-to-all latency from expert parallelism and low GPU utilization caused by uneven skinny GEMMs. By utilizing resource modeling, Piper enhances training efficiency through effective pipelined hybrid parallelism strategies.
Key facts
- Piper is a framework for efficient large-scale MoE training.
- MoE architectures are adopted by frontier models for reduced cost.
- Training MoE on HPC faces memory, communication, and imbalance issues.
- A mathematical model quantifies memory, compute, and communication requirements.
- Bottlenecks include all-to-all latency, insufficient overlap, low GPU utilization.
- Piper uses resource modeling for pipelined hybrid parallelism.
- The work is published on arXiv with ID 2605.05049.
- The framework aims to improve training efficiency on HPC platforms.
Entities
Institutions
- arXiv