DR Tulu-8B: Open-Source AI for Long-Form Research via Reinforcement Learning with Evolving Rubrics

ai-technology · 2026-05-18

Researchers have introduced Reinforcement Learning with Evolving Rubrics (RLER), a novel training method for deep research agents, and developed DR Tulu-8B, the first fully open model trained for open-ended, long-form research. Unlike prior open models that rely on short-form QA tasks with verifiable rewards, RLER constructs and maintains rubrics that co-evolve with the policy model during training, incorporating newly explored information from search and contrasting model responses for better fact-checking. DR Tulu-8B substantially outperforms existing open deep research models across four long-form benchmarks in science, healthcare, and general domains. The work is detailed in a paper on arXiv (2511.19399).

Key facts

RLER stands for Reinforcement Learning with Evolving Rubrics.
DR Tulu-8B is the first fully open model for open-ended, long-form deep research.
The model outperforms existing open deep research models on four benchmarks.
Benchmarks cover science, healthcare, and general domains.
RLER uses rubrics that co-evolve with the policy model during training.
Rubrics incorporate information from search and contrasting model responses.
The paper is available on arXiv with ID 2511.19399.
The research addresses limitations of training on short-form QA tasks.

DR Tulu-8B: Open-Source AI for Long-Form Research via Reinforcement Learning with Evolving Rubrics

Key facts

Entities

Institutions

Sources