Pruning 75% of Experts from MoE LLMs for Translation with Minimal Loss
A new method aggressively prunes experts from mixture-of-experts large language models to create efficient translation specialists. The approach exploits expert specialization and separable multilingual capabilities to identify and remove translation-irrelevant experts without retraining. Pruning half of all experts yields negligible degradation, 70% pruning causes only minor losses, and 75% pruning with short supervised fine-tuning recovers baseline performance. This drastically reduces memory and compute requirements for translation tasks.
Key facts
- Method prunes experts from MoE LLMs for translation
- Exploits expert specialization and separable multilingual capabilities
- Pruning 50% of experts yields negligible degradation
- Pruning 70% causes only minor losses
- Pruning 75% with short SFT recovers baseline performance
- No retraining required for moderate pruning
- Reduces memory and compute requirements
- Published on arXiv with ID 2605.28042
Entities
Institutions
- arXiv