DR.Q Algorithm for Sample-Efficient Continuous Control
A new algorithm called Debiased model-based Representations for Q-learning (DR.Q) has been proposed to improve sample efficiency in continuous control tasks. The method addresses biases in existing model-based representation approaches, which often fail to capture sufficient information about relevant variables and overfit to early experiences in the replay buffer. DR.Q explicitly maximizes mutual information between representations of current state-action pairs and the next state, while minimizing their deviations. It also samples transitions using faded prioritized experience. The approach combines model-free and model-based advantages without the training costs of model-based methods.
Key facts
- DR.Q stands for Debiased model-based Representations for Q-learning
- The algorithm maximizes mutual information between current state-action and next state representations
- It minimizes deviations between representations
- It uses faded prioritized experience replay
- Existing model-based representations can fail to capture sufficient information
- Existing methods can overfit to early experiences
- DR.Q avoids training costs of model-based methods
- The approach is designed for continuous control tasks
Entities
Institutions
- arXiv