SCRL: Curriculum RL Enables Credit Assignment for LLM Reasoning
Researchers introduce SCRL (Subproblem Curriculum Reinforcement Learning), a framework that improves LLM reasoning by breaking hard problems into verifiable subproblems. Unlike standard outcome-based RLVR, which struggles with rare correct rollouts and cannot leverage partial progress, SCRL derives subproblems from reference reasoning chains and uses subproblem-level normalization to assign finer-grained credit without external rubrics. This approach turns partial progress into learning signals, lifting hard problems out of gradient dead zones.
Key facts
- SCRL stands for Subproblem Curriculum Reinforcement Learning.
- It addresses inefficiency of outcome-based RLVR on hard problems.
- Derives verifiable subproblems from reference reasoning chains.
- Fixes the final subproblem as the original problem.
- Uses subproblem-level normalization for finer-grained credit assignment.
- No external rubrics or reward models are needed.
- Analysis shows subproblem curricula lift hard problems out of gradient dead zones.
- Published on arXiv with ID 2605.22074.
Entities
Institutions
- arXiv