DROL: Dynamic Routing for One-Step Offline RL

other · 2026-04-27

A new approach called DROL has been introduced by researchers, which is a latent-conditioned one-step actor designed for offline reinforcement learning utilizing top-1 dynamic routing. The actor selects K candidate actions from a constrained latent prior for each state, linking each action in the dataset to its closest candidate. It then updates only the selected winner through behavior cloning and guidance from the critic. This method circumvents the trade-off found in one-step extraction processes, which require a single output to enhance Q values while remaining close to a teacher-supplied endpoint. DROL aims to enhance offline reinforcement learning without the need for iterative sampling.

Key facts

DROL stands for latent-conditioned one-step actor with top-1 dynamic routing
The actor samples K candidate actions from a bounded latent prior per state
Each dataset action is assigned to its nearest candidate
Only the winning candidate is updated with behavior cloning and critic guidance
The method avoids compromise between Q improvement and staying near teacher endpoints
One-step offline RL actors avoid backpropagation through long iterative samplers
The paper is available on arXiv with ID 2604.22229
The announcement type is cross

DROL: Dynamic Routing for One-Step Offline RL

Key facts

Entities

Institutions

Sources