COMPASS: Continual Multilingual PEFT with Adaptive Semantic Sampling
Researchers propose COMPASS, a data-centric framework for adapting large language models to target languages while mitigating negative cross-lingual interference. The method uses parameter-efficient fine-tuning (PEFT) with lightweight, language-specific adapters trained on a selected subset of auxiliary multilingual data. A distribution-aware sampling strategy leverages multilingual embeddings and clustering to identify semantic gaps, prioritizing data from under-represented clusters to maximize positive transfer. The framework extends into continual learning. The paper is available on arXiv under ID 2604.20720.
Key facts
- COMPASS stands for Continual Multilingual PEFT with Adaptive Semantic Sampling.
- The framework addresses performance disparities across languages in LLMs.
- It uses parameter-efficient fine-tuning (PEFT) with language-specific adapters.
- A distribution-aware sampling strategy identifies semantic gaps using multilingual embeddings and clustering.
- The method prioritizes auxiliary data from under-represented semantic clusters.
- COMPASS extends into a continual learning framework.
- The paper is published on arXiv with ID 2604.20720.
- The approach aims to maximize positive cross-lingual transfer while minimizing interference.
Entities
Institutions
- arXiv