DR-Venus: 4B Deep Research Agent Trained on 10K Open Data

ai-technology · 2026-04-24

Researchers introduce DR-Venus, a frontier 4B small language model-based deep research agent designed for edge-scale deployment. Built entirely on open data, it achieves strong performance using only 10K trajectories. The training recipe involves two stages: agentic supervised fine-tuning (SFT) with strict data cleaning and resampling of long-horizon trajectories, followed by agentic reinforcement learning (RL) to improve execution reliability. RL effectiveness is enhanced by building on IGPO and designing turn-level rewards based on information gain. The work addresses cost, latency, and privacy advantages of edge-scale agents.

Key facts

DR-Venus is a 4B parameter deep research agent.
Trained on only 10K open data trajectories.
Designed for edge-scale deployment.
Two-stage training: agentic SFT then agentic RL.
SFT includes strict data cleaning and resampling.
RL improves execution reliability on long-horizon tasks.
RL uses IGPO and turn-level rewards based on information gain.
Focuses on cost, latency, and privacy benefits.

Entities

—

Sources

arXiv cs.AI — 2026-04-23