ARTFEED — Contemporary Art Intelligence

Piper Framework Optimizes Large-Scale MoE Training on HPC Platforms

ai-technology · 2026-05-07

A novel framework known as Piper tackles the performance limitations encountered when training Mixture-of-Experts (MoE) models on high-performance computing (HPC) systems. While MoE architectures are gaining traction in cutting-edge AI models due to their efficiency, they face obstacles such as large memory demands, extensive communication needs across diverse networks, and significant workload disparities. Piper employs a mathematical model to assess memory, computation, and communication needs across different parallelization methods, validated through micro-benchmarking, code instrumentation, and hardware profiling. It pinpoints critical bottlenecks, including all-to-all latency from expert parallelism and low GPU utilization caused by uneven skinny GEMMs. By utilizing resource modeling, Piper enhances training efficiency through effective pipelined hybrid parallelism strategies.

Key facts

  • Piper is a framework for efficient large-scale MoE training.
  • MoE architectures are adopted by frontier models for reduced cost.
  • Training MoE on HPC faces memory, communication, and imbalance issues.
  • A mathematical model quantifies memory, compute, and communication requirements.
  • Bottlenecks include all-to-all latency, insufficient overlap, low GPU utilization.
  • Piper uses resource modeling for pipelined hybrid parallelism.
  • The work is published on arXiv with ID 2605.05049.
  • The framework aims to improve training efficiency on HPC platforms.

Entities

Institutions

  • arXiv

Sources