ARTFEED — Contemporary Art Intelligence

Optimizing Language Mixture Ratio for Llama-3 Continual Pre-Training

ai-technology · 2026-04-30

A new arXiv paper (2409.06624) investigates the optimal selection of Additional Language Mixture Ratio (ALMR) and Learning Rate (LR) for Continual Pre-Training (CPT) of Large Language Models (LLMs) to enhance Chinese language abilities. The study performs CPT on Llama-3 8B and 70B models, establishing a correlation between ALMR and LR on the 8B size that directly indicates optimal experimental setup. Through hyper-parameter tuning and subsequent fine-tuning, model performance improves on Chinese-related benchmarks and specific domains. The research addresses the gap between experimental scaling laws and full-size model deployment, providing systematic guidance for CPT hyper-parameter selection.

Key facts

  • Paper arXiv:2409.06624v4
  • Focuses on Continual Pre-Training (CPT) for Llama-3 8B and 70B
  • Enhances Chinese language ability
  • Studies optimal Additional Language Mixture Ratio (ALMR) and Learning Rate (LR)
  • Bridges gap between experimental scaling law and full model deployment
  • Improves performance on Chinese-related benchmarks
  • Hyper-parameter tuning and fine-tuning involved
  • Published on arXiv

Entities

Institutions

  • arXiv

Sources