Deep Double Q-Learning Improves Over Double DQN on Atari Games

ai-technology · 2026-05-18

A novel deep reinforcement learning approach, known as Deep Double Q-learning (DDQL), tackles the overestimation bias found in Double DQN by training two distinct Q-functions through the Double Q-learning method. In contrast to Double DQN, which relies on a single action-value function and maintains correlated estimators, DDQL separates the processes of action selection and evaluation entirely. This algorithm enhances training stability by utilizing reduced replay ratios, extending target network update intervals, and incorporating shared layers. When tested on 57 Atari 2600 games, DDQL surpassed Double DQN in 47 of those games, showcasing a notable improvement in overall performance.

Key facts

DDQL is introduced as a deep RL algorithm that explicitly trains two Q-functions via Double Q-learning.
Double DQN trains only a single action-value function, leading to correlated estimators and persistent overestimation.
DDQL uses lower replay ratios, longer target network update intervals, and shared layers for training stability.
Experiments were conducted on 57 Atari 2600 games.
DDQL outperformed Double DQN on 47 out of 57 games.
The paper is available on arXiv with ID 2507.00275.
Double Q-learning is a classical control algorithm that mitigates maximization bias.
DDQL adapts target bootstrap decoupling to deep reinforcement learning.

Entities

—

Sources

arXiv cs.AI — 2026-05-18