DR.Q Algorithm for Sample-Efficient Continuous Control

other · 2026-05-13

A new algorithm called Debiased model-based Representations for Q-learning (DR.Q) has been proposed to improve sample efficiency in continuous control tasks. The method addresses biases in existing model-based representation approaches, which often fail to capture sufficient information about relevant variables and overfit to early experiences in the replay buffer. DR.Q explicitly maximizes mutual information between representations of current state-action pairs and the next state, while minimizing their deviations. It also samples transitions using faded prioritized experience. The approach combines model-free and model-based advantages without the training costs of model-based methods.

Key facts

DR.Q stands for Debiased model-based Representations for Q-learning
The algorithm maximizes mutual information between current state-action and next state representations
It minimizes deviations between representations
It uses faded prioritized experience replay
Existing model-based representations can fail to capture sufficient information
Existing methods can overfit to early experiences
DR.Q avoids training costs of model-based methods
The approach is designed for continuous control tasks

DR.Q Algorithm for Sample-Efficient Continuous Control

Key facts

Entities

Institutions

Sources