ARTFEED — Contemporary Art Intelligence

Vocabulary Dropout Prevents Diversity Collapse in LLM Co-Evolution

ai-technology · 2026-04-30

A new method called vocabulary dropout addresses diversity collapse in co-evolutionary self-play for large language models. In this setup, one model (the proposer) generates problems and another (the solver) solves them, but the proposer often converges to a narrow set of problems. Vocabulary dropout applies a random mask to the proposer's output logits during training and generation, preventing fixation on specific token sequences. Experiments with Qwen3-4B and Qwen3-8B on mathematical reasoning via R-Zero show sustained diversity across lexical, semantic, and functional metrics, with solver improvements averaging +4.4 points at 8B.

Key facts

  • Vocabulary dropout is a random mask applied to the proposer's output logits.
  • It prevents the proposer from locking into fixed token sequences.
  • The mask is hard and non-stationary.
  • Experiments used Qwen3-4B and Qwen3-8B models.
  • Training was on mathematical reasoning via R-Zero.
  • Diversity was sustained across lexical, semantic, and functional metrics.
  • Solver improvements averaged +4.4 points at 8B.
  • The method is lightweight and requires no human supervision.

Entities

Institutions

  • arXiv

Sources