ARTFEED — Contemporary Art Intelligence

MobileMoE: On-Device MoE Language Models Achieve New Pareto Frontier

ai-technology · 2026-05-27

MobileMoE introduces a series of on-device Mixture-of-Experts (MoE) language models, featuring 0.3-0.9B active parameters and a total of 1.3-5.3B parameters, setting a new benchmark for on-device LLMs. The study presents a scaling law for on-device MoE that optimally balances architecture within mobile memory and computational limits, pinpointing an ideal combination of moderate sparsity and finely-tuned shared experts. Training occurs through a four-phase process (pre-training, mid-training, instruction fine-tuning, quantization-aware training) utilizing open-source datasets. In evaluations across 14 benchmarks, MobileMoE either matches or surpasses the performance of current models.

Key facts

  • MobileMoE models have 0.3-0.9B active parameters and 1.3-5.3B total parameters.
  • The scaling law optimizes MoE architecture for mobile memory and compute constraints.
  • Optimal configuration uses moderate sparsity with fine-grained and shared experts.
  • Training includes pre-training, mid-training, instruction fine-tuning, and quantization-aware training.
  • All training data is from open-source datasets.
  • MobileMoE is evaluated on 14 benchmarks.
  • The models establish a new Pareto frontier for on-device LLMs.
  • The work is published on arXiv (2605.27358).

Entities

Institutions

  • arXiv

Sources