Vocabulary Dropout Prevents Diversity Collapse in LLM Co-Evolution

ai-technology · 2026-04-30

A new method called vocabulary dropout addresses diversity collapse in co-evolutionary self-play for large language models. In this setup, one model (the proposer) generates problems and another (the solver) solves them, but the proposer often converges to a narrow set of problems. Vocabulary dropout applies a random mask to the proposer's output logits during training and generation, preventing fixation on specific token sequences. Experiments with Qwen3-4B and Qwen3-8B on mathematical reasoning via R-Zero show sustained diversity across lexical, semantic, and functional metrics, with solver improvements averaging +4.4 points at 8B.

Key facts

Vocabulary dropout is a random mask applied to the proposer's output logits.
It prevents the proposer from locking into fixed token sequences.
The mask is hard and non-stationary.
Experiments used Qwen3-4B and Qwen3-8B models.
Training was on mathematical reasoning via R-Zero.
Diversity was sustained across lexical, semantic, and functional metrics.
Solver improvements averaged +4.4 points at 8B.
The method is lightweight and requires no human supervision.

Vocabulary Dropout Prevents Diversity Collapse in LLM Co-Evolution

Key facts

Entities

Institutions

Sources