DeMix: Decoupling Search from Training for LLM Data Mixing
The recently introduced Decouple Searching from Training Mix (DeMix) framework suggests leveraging model merging to determine the best data ratios for pre-training Large Language Models (LLMs). Conventional techniques often depend on questionable small-scale proxy tests or involve costly large-scale investigations. DeMix scales up the training of component models on selected datasets and generates data mixture proxies through weighted model merging, effectively separating search from training expenses. This innovation facilitates the assessment of countless sampled mixtures without additional training demands, enhancing mixture discovery through increased search attempts. The method tackles the difficulty of achieving a balance between overall competence and expertise in challenging areas like mathematics and programming.
Key facts
- DeMix is a novel framework for LLM pre-training data mixing.
- It uses model merging to predict optimal data ratios.
- Component models are trained on candidate datasets at scale.
- Data mixture proxies are derived via weighted model merging.
- Search is decoupled from training costs.
- Unlimited sampled mixtures can be evaluated without extra training.
- The goal is to balance general competence with hard task proficiency.
- Existing approaches rely on unreliable proxy experiments or expensive exploration.
Entities
—