Mix-MoE: Mixed Mixture-of-Experts for Multilingual Machine Translation

ai-technology · 2026-05-26

A new framework called Mix-MoE addresses parameter interference in fine-tuning large language models (LLMs) for multilingual machine translation (MT). The approach uses a mixed Mixture-of-Experts (MoE) architecture with two specialized groups: Language Model Experts (LM Experts) for monolingual knowledge and Machine Translation Experts (MT Experts) for bilingual translation knowledge. Training occurs in two stages: post-pretraining with MoE on monolingual corpora, then on parallel corpora. The framework aims to improve multilingual MT performance while retaining pretrained knowledge.

Key facts

Mix-MoE is a mixed Mixture-of-Experts framework for multilingual machine translation.
It addresses parameter interference in fine-tuning LLMs with parallel corpora.
The framework has two training stages: post-pretraining on monolingual corpora, then on parallel corpora.
MoE layers are divided into Language Model Experts (LM Experts) and Machine Translation Experts (MT Experts).
LM Experts capture and retain monolingual knowledge from the pretrained LLM.
MT Experts are trained to acquire bilingual translation knowledge.
The approach aims to improve multilingual MT performance.
The paper is available on arXiv with ID 2605.24681.

Mix-MoE: Mixed Mixture-of-Experts for Multilingual Machine Translation

Key facts

Entities

Institutions

Sources