DR Tulu-8B: Open-Source AI for Long-Form Research via Reinforcement Learning with Evolving Rubrics
Researchers have introduced Reinforcement Learning with Evolving Rubrics (RLER), a novel training method for deep research agents, and developed DR Tulu-8B, the first fully open model trained for open-ended, long-form research. Unlike prior open models that rely on short-form QA tasks with verifiable rewards, RLER constructs and maintains rubrics that co-evolve with the policy model during training, incorporating newly explored information from search and contrasting model responses for better fact-checking. DR Tulu-8B substantially outperforms existing open deep research models across four long-form benchmarks in science, healthcare, and general domains. The work is detailed in a paper on arXiv (2511.19399).
Key facts
- RLER stands for Reinforcement Learning with Evolving Rubrics.
- DR Tulu-8B is the first fully open model for open-ended, long-form deep research.
- The model outperforms existing open deep research models on four benchmarks.
- Benchmarks cover science, healthcare, and general domains.
- RLER uses rubrics that co-evolve with the policy model during training.
- Rubrics incorporate information from search and contrasting model responses.
- The paper is available on arXiv with ID 2511.19399.
- The research addresses limitations of training on short-form QA tasks.
Entities
Institutions
- arXiv