CoSearch Framework Jointly Trains Reasoning Agents and Document Ranking Models via Reinforcement Learning
A recent study presents CoSearch, a framework designed to simultaneously train multi-step reasoning agents alongside generative document ranking models through Group Relative Policy Optimization (GRPO). This method tackles the shortcomings of current agentic search systems, such as Search-R1, which view retrieval systems as static tools while focusing solely on optimizing reasoning agents. Initial tests indicate that the disparity in performance between an oracle retrieval system and a static one can reach a relative F1 improvement of up to 26.8% across seven question-answering benchmarks, highlighting retrieval systems as a significant obstacle in enhancing agentic search capabilities. Agentic search involves training agents to reason iteratively, generate queries, and integrate retrieved information to solve complex questions, with reinforcement learning facilitating recent advancements. The CoSearch framework allows for effective GRPO training for rankers with inputs that differ across reasoning paths, addressing prior limitations where retrieval components remained unchanged during the optimization process.
Key facts
- CoSearch jointly trains reasoning agents and document ranking models
- Uses Group Relative Policy Optimization (GRPO) for training
- Addresses limitations of existing approaches like Search-R1
- Fixed retrieval systems create performance bottlenecks in agentic search
- Oracle vs. fixed retrieval gap reaches 26.8% F1 improvement across 7 QA benchmarks
- Agentic search involves iterative reasoning, querying, and information synthesis
- Reinforcement learning has driven recent progress in agentic search
- Research published on arXiv with identifier 2604.17555v1
Entities
Institutions
- arXiv