LiteResearcher Framework Enables Scalable Agentic RL Training for Deep Research AI
A new training framework called LiteResearcher addresses scalability challenges in reinforcement learning for LLM-based research agents. By creating a virtual environment that simulates real-world search dynamics, the approach allows continuous improvement without the prohibitive costs and instability of actual search dependency during training. This method enables a compact 4B-parameter model to surpass both open-source and commercial alternatives like Tongyi DeepResearch and Claude-4.5 Sonnet on established benchmarks. LiteResearcher-4B achieved state-of-the-art open-source results of 71.3% on GAIA and 78.0% on Xbench. The framework overcomes limitations of hand-crafted synthetic data, which often fails to develop authentic search capabilities. Reinforcement learning has become a significant paradigm for training AI agents, but scaling it for deep research applications has been constrained by these intertwined issues. The research was documented in arXiv preprint 2604.17931v1.
Key facts
- LiteResearcher is a scalable training framework for agentic reinforcement learning
- It constructs a lite virtual world mirroring real-world search dynamics
- The framework enables continuous improvement in training recipes
- A 4B-parameter model outperforms large-scale open-source and commercial models
- Achieved 71.3% on GAIA benchmark
- Achieved 78.0% on Xbench benchmark
- Addresses challenges of hand-crafted synthetic data limitations
- Overcomes instability and high costs of real-world search dependency during RL training
Entities
Institutions
- arXiv