GEM: GPU-Variability-Aware Expert Mapping for MoE Models

ai-technology · 2026-05-20

The newly introduced GEM (GPU-variability-aware Expert Mapping) framework tackles a significant challenge in serving Mixture-of-Expert (MoE) models. These models utilize smaller experts and activate only a portion for each token, spreading them across various GPUs. However, the performance is hindered by synchronization barriers during lock-step processing, where the slowest GPU, known as the straggler, dictates the overall speed. Stragglers arise when frequently used experts are assigned to the same or slower GPUs. Previous approaches focused on balancing token distribution but overlooked GPU variability, often leading to popular experts being placed on less efficient GPUs. GEM effectively considers GPU speed variations to optimize expert mapping, thereby minimizing the effects of stragglers. The research can be found on arXiv (2605.19945).

Key facts

GEM stands for GPU-variability-aware Expert Mapping.
It targets Mixture-of-Expert (MoE) models.
MoE models activate a subset of experts per token.
Synchronization barrier causes straggler GPUs to limit performance.
Stragglers arise from unbalanced expert placement and GPU variability.
Prior works ignored GPU variability.
GEM maps experts considering GPU speed differences.
Paper available on arXiv with ID 2605.19945.

GEM: GPU-Variability-Aware Expert Mapping for MoE Models

Key facts

Entities

Institutions

Sources