ARTFEED — Contemporary Art Intelligence

DR Tulu-8B: Open-Source AI for Long-Form Research via Reinforcement Learning with Evolving Rubrics

ai-technology · 2026-05-18

Researchers have introduced Reinforcement Learning with Evolving Rubrics (RLER), a novel training method for deep research agents, and developed DR Tulu-8B, the first fully open model trained for open-ended, long-form research. Unlike prior open models that rely on short-form QA tasks with verifiable rewards, RLER constructs and maintains rubrics that co-evolve with the policy model during training, incorporating newly explored information from search and contrasting model responses for better fact-checking. DR Tulu-8B substantially outperforms existing open deep research models across four long-form benchmarks in science, healthcare, and general domains. The work is detailed in a paper on arXiv (2511.19399).

Key facts

  • RLER stands for Reinforcement Learning with Evolving Rubrics.
  • DR Tulu-8B is the first fully open model for open-ended, long-form deep research.
  • The model outperforms existing open deep research models on four benchmarks.
  • Benchmarks cover science, healthcare, and general domains.
  • RLER uses rubrics that co-evolve with the policy model during training.
  • Rubrics incorporate information from search and contrasting model responses.
  • The paper is available on arXiv with ID 2511.19399.
  • The research addresses limitations of training on short-form QA tasks.

Entities

Institutions

  • arXiv

Sources