Piper Framework Optimizes Large-Scale MoE Training on HPC Platforms

ai-technology · 2026-05-07

A novel framework known as Piper tackles the performance limitations encountered when training Mixture-of-Experts (MoE) models on high-performance computing (HPC) systems. While MoE architectures are gaining traction in cutting-edge AI models due to their efficiency, they face obstacles such as large memory demands, extensive communication needs across diverse networks, and significant workload disparities. Piper employs a mathematical model to assess memory, computation, and communication needs across different parallelization methods, validated through micro-benchmarking, code instrumentation, and hardware profiling. It pinpoints critical bottlenecks, including all-to-all latency from expert parallelism and low GPU utilization caused by uneven skinny GEMMs. By utilizing resource modeling, Piper enhances training efficiency through effective pipelined hybrid parallelism strategies.

Key facts

Piper is a framework for efficient large-scale MoE training.
MoE architectures are adopted by frontier models for reduced cost.
Training MoE on HPC faces memory, communication, and imbalance issues.
A mathematical model quantifies memory, compute, and communication requirements.
Bottlenecks include all-to-all latency, insufficient overlap, low GPU utilization.
Piper uses resource modeling for pipelined hybrid parallelism.
The work is published on arXiv with ID 2605.05049.
The framework aims to improve training efficiency on HPC platforms.

Piper Framework Optimizes Large-Scale MoE Training on HPC Platforms

Key facts

Entities

Institutions

Sources