ARTFEED — Contemporary Art Intelligence

DROL: Dynamic Routing for One-Step Offline RL

other · 2026-04-27

A new approach called DROL has been introduced by researchers, which is a latent-conditioned one-step actor designed for offline reinforcement learning utilizing top-1 dynamic routing. The actor selects K candidate actions from a constrained latent prior for each state, linking each action in the dataset to its closest candidate. It then updates only the selected winner through behavior cloning and guidance from the critic. This method circumvents the trade-off found in one-step extraction processes, which require a single output to enhance Q values while remaining close to a teacher-supplied endpoint. DROL aims to enhance offline reinforcement learning without the need for iterative sampling.

Key facts

  • DROL stands for latent-conditioned one-step actor with top-1 dynamic routing
  • The actor samples K candidate actions from a bounded latent prior per state
  • Each dataset action is assigned to its nearest candidate
  • Only the winning candidate is updated with behavior cloning and critic guidance
  • The method avoids compromise between Q improvement and staying near teacher endpoints
  • One-step offline RL actors avoid backpropagation through long iterative samplers
  • The paper is available on arXiv with ID 2604.22229
  • The announcement type is cross

Entities

Institutions

  • arXiv

Sources