ARTFEED — Contemporary Art Intelligence

RaMP Boosts MoE Kernel Performance by 22%

ai-technology · 2026-04-30

RaMP is a dispatch framework designed for Mixture-of-Experts (MoE) inference that is aware of routing, achieving a kernel speedup of up to 1.22x by taking into account both the batch size and the distribution of expert routing. Current production systems typically rely only on batch size for dispatching, which results in 10-70% of kernel throughput being wasted. By utilizing a performance-region analysis based on hardware constants, RaMP can accurately predict when optimizations will be beneficial, successfully forecasting outcomes for all eight tested architectures, including three that were not previously seen. Its four-parameter wave cost model identifies the quickest configuration from the runtime expert histogram, exhibiting only 0.93% mean regret compared to exhaustive searches, based on just 10-24 minutes of initial profiling per model. This kernel-agnostic model relies solely on CTA grid geometry and, when applied to Alpha-MoE, achieves a 1.14x speedup without requiring any modifications to the source code. When combined with a co-designed CuTe DSL kernel that offers 134-268 polymorphic configurations, RaMP provides a 1.22x kernel speedup.

Key facts

  • RaMP is a routing-aware dispatch framework for MoE inference.
  • Production systems dispatch from batch size alone, leaving 10-70% of kernel throughput unrealized.
  • Performance-region analysis derives from hardware constants when each optimization helps.
  • Correctly predicted all 8 tested architectures, including 3 unseen.
  • Four-parameter wave cost model selects fastest configuration from runtime expert histogram.
  • Achieves 0.93% mean regret versus exhaustive search.
  • Fitted from 10-24 minutes of one-time profiling per model.
  • Kernel-agnostic: applied to Alpha-MoE delivers 1.14x with no source modification.
  • Co-designed CuTe DSL kernel exposes 134-268 polymorphic configurations.
  • RaMP delivers 1.22x kernel speedup.

Entities

Sources