CuSearch: Curriculum Rollout Sampling for Agentic RAG Training
CuSearch is a proposed framework for implementing a curriculum rollout sampling aimed at training agentic retrieval-augmented generation (RAG) systems through Reinforcement Learning with Verifiable Rewards (RLVR). Unlike current approaches that regard all trajectories as equal, deeper-search trajectories offer more retrieval decision points and richer supervision. To tackle the increasing heterogeneity in search depth during training, CuSearch employs Search-Depth Greedy Allocation (SDGA) to redistribute the update budget towards these deeper-search trajectories. This framework is detailed in the arXiv paper numbered 2605.11611.
Key facts
- CuSearch is a curriculum rollout sampling framework for agentic RAG.
- It uses Search-Depth Greedy Allocation (SDGA) to prioritize deeper-search trajectories.
- RLVR is used for training from outcome-only supervision.
- Deeper-search trajectories provide denser supervision for the retrieval sub-policy.
- Uniform rollout sampling ignores depth heterogeneity that grows during training.
- The paper is available on arXiv with ID 2605.11611.
Entities
Institutions
- arXiv