Entropy Collapse in RLVR: A Unified Theoretical Framework
The research presented in arXiv paper 2510.10150 explores the phenomenon of entropy collapse within Reinforcement Learning with Verifiable Rewards (RLVR), a method aimed at improving reasoning capabilities in Large Language Models. The authors formulate a precise analytical approximation for the change in token-level entropy at every update step, pinpointing four key influencing factors. They introduce a cohesive theoretical framework that clarifies the impact of current heuristic entropy strategies on entropy behavior. This study uncovers a critical limitation in recent methodologies: their dependence on heuristic modifications to just one or two factors. Additionally, the paper delivers thorough theoretical and empirical insights into entropy dynamics in RLVR.
Key facts
- arXiv paper number: 2510.10150
- Focuses on Reinforcement Learning with Verifiable Rewards (RLVR)
- Addresses entropy collapse in LLM training
- Derives analytical approximation for token-level entropy change
- Identifies four governing factors of entropy dynamics
- Provides unified theoretical framework for entropy interventions
- Reveals limitation of heuristic adjustments in recent approaches
- Includes both theoretical and empirical analyses
Entities
Institutions
- arXiv