ARTFEED — Contemporary Art Intelligence

Mix-MoE: Mixed Mixture-of-Experts for Multilingual Machine Translation

ai-technology · 2026-05-26

A new framework called Mix-MoE addresses parameter interference in fine-tuning large language models (LLMs) for multilingual machine translation (MT). The approach uses a mixed Mixture-of-Experts (MoE) architecture with two specialized groups: Language Model Experts (LM Experts) for monolingual knowledge and Machine Translation Experts (MT Experts) for bilingual translation knowledge. Training occurs in two stages: post-pretraining with MoE on monolingual corpora, then on parallel corpora. The framework aims to improve multilingual MT performance while retaining pretrained knowledge.

Key facts

  • Mix-MoE is a mixed Mixture-of-Experts framework for multilingual machine translation.
  • It addresses parameter interference in fine-tuning LLMs with parallel corpora.
  • The framework has two training stages: post-pretraining on monolingual corpora, then on parallel corpora.
  • MoE layers are divided into Language Model Experts (LM Experts) and Machine Translation Experts (MT Experts).
  • LM Experts capture and retain monolingual knowledge from the pretrained LLM.
  • MT Experts are trained to acquire bilingual translation knowledge.
  • The approach aims to improve multilingual MT performance.
  • The paper is available on arXiv with ID 2605.24681.

Entities

Institutions

  • arXiv

Sources