Vocabulary Dropout Prevents Diversity Collapse in LLM Co-Evolution
A new method called vocabulary dropout addresses diversity collapse in co-evolutionary self-play for large language models. In this setup, one model (the proposer) generates problems and another (the solver) solves them, but the proposer often converges to a narrow set of problems. Vocabulary dropout applies a random mask to the proposer's output logits during training and generation, preventing fixation on specific token sequences. Experiments with Qwen3-4B and Qwen3-8B on mathematical reasoning via R-Zero show sustained diversity across lexical, semantic, and functional metrics, with solver improvements averaging +4.4 points at 8B.
Key facts
- Vocabulary dropout is a random mask applied to the proposer's output logits.
- It prevents the proposer from locking into fixed token sequences.
- The mask is hard and non-stationary.
- Experiments used Qwen3-4B and Qwen3-8B models.
- Training was on mathematical reasoning via R-Zero.
- Diversity was sustained across lexical, semantic, and functional metrics.
- Solver improvements averaged +4.4 points at 8B.
- The method is lightweight and requires no human supervision.
Entities
Institutions
- arXiv