Network Topologies for Cost-Effective MoE LLM Serving

ai-technology · 2026-05-04

A new study from arXiv (2605.00254) challenges the necessity of expensive high-bandwidth scale-up networks for mixture-of-experts (MoE) large language model (LLM) serving. The authors present the first systematic cross-layer analysis of network cost-effectiveness, comparing four representative XPU (e.g., GPU/TPU) topologies: scale-up, scale-out, 3D torus, and 3D full-mesh. They find that lower-cost switchless topologies improve cost-effectiveness by 20.6-56.2% over scale-up across all scenarios. The 3D full-mesh topology is Pareto-optimal in performance-cost tradeoff. Current scale-up link bandwidths are over-provisioned; reducing bandwidth improves throughput. The research suggests that costly infrastructure investments may not be strictly necessary for MoE LLM serving.

Key facts

arXiv paper 2605.00254 analyzes network cost-effectiveness for MoE LLM serving.
Four topologies compared: scale-up, scale-out, 3D torus, and 3D full-mesh.
Switchless topologies improve cost-effectiveness by 20.6-56.2% over scale-up.
3D full-mesh is Pareto-optimal in performance-cost tradeoff.
Current scale-up link bandwidths are over-provisioned; reducing bandwidth improves throughput.

Network Topologies for Cost-Effective MoE LLM Serving

Key facts

Entities

Institutions

Sources