Revisiting Adam for Streaming Reinforcement Learning

other · 2026-05-11

A new study challenges the prevailing use of replay buffers in deep reinforcement learning by revisiting online learning from sequential interactions. The authors find that established algorithms like DQN and C51 perform well without replay buffers when paired with the Adam optimizer. They identify two key properties of Adam that enable stable online updates, building on the StreamQ algorithm by Elsayed et al. (2024). The work suggests simpler, more efficient adaptive algorithms are possible.

Key facts

Learning from sequential interactions without storing them promises simpler algorithms.
Deep RL has relied on replay buffers or parallel sampling to manage instability.
Elsayed et al. (2024) introduced StreamQ using eligibility traces and modified optimization.
This work investigates DQN and C51 updates in an online setting.
DQN and C51 perform well without replay buffers.
Adam optimizer interacts favorably with online updates.
Two properties of Adam enable stable online learning.
The study is published on arXiv (2605.06764).

Revisiting Adam for Streaming Reinforcement Learning

Key facts

Entities

Institutions

Sources