SCRL: Curriculum RL Enables Credit Assignment for LLM Reasoning

other · 2026-05-23

Researchers introduce SCRL (Subproblem Curriculum Reinforcement Learning), a framework that improves LLM reasoning by breaking hard problems into verifiable subproblems. Unlike standard outcome-based RLVR, which struggles with rare correct rollouts and cannot leverage partial progress, SCRL derives subproblems from reference reasoning chains and uses subproblem-level normalization to assign finer-grained credit without external rubrics. This approach turns partial progress into learning signals, lifting hard problems out of gradient dead zones.

Key facts

SCRL stands for Subproblem Curriculum Reinforcement Learning.
It addresses inefficiency of outcome-based RLVR on hard problems.
Derives verifiable subproblems from reference reasoning chains.
Fixes the final subproblem as the original problem.
Uses subproblem-level normalization for finer-grained credit assignment.
No external rubrics or reward models are needed.
Analysis shows subproblem curricula lift hard problems out of gradient dead zones.
Published on arXiv with ID 2605.22074.

SCRL: Curriculum RL Enables Credit Assignment for LLM Reasoning

Key facts

Entities

Institutions

Sources