Pruning 75% of Experts from MoE LLMs for Translation with Minimal Loss

ai-technology · 2026-05-28

A new method aggressively prunes experts from mixture-of-experts large language models to create efficient translation specialists. The approach exploits expert specialization and separable multilingual capabilities to identify and remove translation-irrelevant experts without retraining. Pruning half of all experts yields negligible degradation, 70% pruning causes only minor losses, and 75% pruning with short supervised fine-tuning recovers baseline performance. This drastically reduces memory and compute requirements for translation tasks.

Key facts

Method prunes experts from MoE LLMs for translation
Exploits expert specialization and separable multilingual capabilities
Pruning 50% of experts yields negligible degradation
Pruning 70% causes only minor losses
Pruning 75% with short SFT recovers baseline performance
No retraining required for moderate pruning
Reduces memory and compute requirements
Published on arXiv with ID 2605.28042

Pruning 75% of Experts from MoE LLMs for Translation with Minimal Loss

Key facts

Entities

Institutions

Sources