Mix-MoE: Mixed Mixture-of-Experts for Multilingual Machine Translation
A new framework called Mix-MoE addresses parameter interference in fine-tuning large language models (LLMs) for multilingual machine translation (MT). The approach uses a mixed Mixture-of-Experts (MoE) architecture with two specialized groups: Language Model Experts (LM Experts) for monolingual knowledge and Machine Translation Experts (MT Experts) for bilingual translation knowledge. Training occurs in two stages: post-pretraining with MoE on monolingual corpora, then on parallel corpora. The framework aims to improve multilingual MT performance while retaining pretrained knowledge.
Key facts
- Mix-MoE is a mixed Mixture-of-Experts framework for multilingual machine translation.
- It addresses parameter interference in fine-tuning LLMs with parallel corpora.
- The framework has two training stages: post-pretraining on monolingual corpora, then on parallel corpora.
- MoE layers are divided into Language Model Experts (LM Experts) and Machine Translation Experts (MT Experts).
- LM Experts capture and retain monolingual knowledge from the pretrained LLM.
- MT Experts are trained to acquire bilingual translation knowledge.
- The approach aims to improve multilingual MT performance.
- The paper is available on arXiv with ID 2605.24681.
Entities
Institutions
- arXiv