DR-Venus: 4B Deep Research Agent Trained on 10K Open Data
Researchers introduce DR-Venus, a frontier 4B small language model-based deep research agent designed for edge-scale deployment. Built entirely on open data, it achieves strong performance using only 10K trajectories. The training recipe involves two stages: agentic supervised fine-tuning (SFT) with strict data cleaning and resampling of long-horizon trajectories, followed by agentic reinforcement learning (RL) to improve execution reliability. RL effectiveness is enhanced by building on IGPO and designing turn-level rewards based on information gain. The work addresses cost, latency, and privacy advantages of edge-scale agents.
Key facts
- DR-Venus is a 4B parameter deep research agent.
- Trained on only 10K open data trajectories.
- Designed for edge-scale deployment.
- Two-stage training: agentic SFT then agentic RL.
- SFT includes strict data cleaning and resampling.
- RL improves execution reliability on long-horizon tasks.
- RL uses IGPO and turn-level rewards based on information gain.
- Focuses on cost, latency, and privacy benefits.
Entities
—