DecomposeR: Planner-Centric RL for Deep Research with DAG-Based Rewards

ai-technology · 2026-06-01

DecomposeR is a newly introduced framework that emphasizes a planner-centric methodology for conducting in-depth research tasks. In this approach, large language models (LLMs) are utilized to devise investigation strategies, gather evidence, and create comprehensive responses. Traditional techniques often face challenges with credit assignment during planning phases. DecomposeR utilizes typed directed acyclic graphs (DAGs) to clearly represent research plans, allowing for explicit and rewardable planning. The framework involves training the Qwen3-8B model in two phases: initially, planner reinforcement learning (RL) focuses on learning the graph structure and query decomposition, followed by answerer RL, which concentrates on executing branches and synthesizing final answers based on the established plan. Rewards are given for explicit planner tokens and structured outputs. The paper can be found on arXiv with the ID 2605.30824.

Key facts

DecomposeR is a planner-centric deep research framework.
Research plans are represented as typed directed acyclic graphs (DAGs).
The model trained is Qwen3-8B.
Training occurs in two stages: planner RL and answerer RL.
Planner RL learns graph structure and query decomposition.
Answerer RL learns branch-level execution and final synthesis.
Rewards are assigned to explicit planner tokens.
Paper available at arXiv:2605.30824.

DecomposeR: Planner-Centric RL for Deep Research with DAG-Based Rewards

Key facts

Entities

Institutions

Sources