MobileMoE: On-Device MoE Language Models Achieve New Pareto Frontier

ai-technology · 2026-05-27

MobileMoE introduces a series of on-device Mixture-of-Experts (MoE) language models, featuring 0.3-0.9B active parameters and a total of 1.3-5.3B parameters, setting a new benchmark for on-device LLMs. The study presents a scaling law for on-device MoE that optimally balances architecture within mobile memory and computational limits, pinpointing an ideal combination of moderate sparsity and finely-tuned shared experts. Training occurs through a four-phase process (pre-training, mid-training, instruction fine-tuning, quantization-aware training) utilizing open-source datasets. In evaluations across 14 benchmarks, MobileMoE either matches or surpasses the performance of current models.

Key facts

MobileMoE models have 0.3-0.9B active parameters and 1.3-5.3B total parameters.
The scaling law optimizes MoE architecture for mobile memory and compute constraints.
Optimal configuration uses moderate sparsity with fine-grained and shared experts.
Training includes pre-training, mid-training, instruction fine-tuning, and quantization-aware training.
All training data is from open-source datasets.
MobileMoE is evaluated on 14 benchmarks.
The models establish a new Pareto frontier for on-device LLMs.
The work is published on arXiv (2605.27358).

MobileMoE: On-Device MoE Language Models Achieve New Pareto Frontier

Key facts

Entities

Institutions

Sources