Qreg+NWLU: Novel Data Rehearsal Method for Continual Reinforcement Learning

other · 2026-05-23

A recent study published on arXiv (2605.22454) introduces Qreg+NWLU, a technique aimed at reducing catastrophic forgetting in Continual Reinforcement Learning (CRL) via value-based data rehearsal. Traditional CRL methods typically emphasize policy gradient approaches and only regularize the actors, overlooking the value function approximation. The researchers tackle this gap by exploring data rehearsal for Deep Q-Networks, utilizing Q-value regularization in environments with recurring task sequences. Qreg+NWLU features two key innovations: a continuous data rehearsal process that actively gathers and refreshes stored Q-values during training, and 'No-Wait' regularization, which takes effect immediately rather than after the initial task. The study notes that multi-cyclic environments intensify forgetting and plasticity, a significant yet underexamined real-world challenge.

Key facts

Paper title: Don't Forget the Critic: Value-Based Data Rehearsal for Multi-Cyclic Continual Reinforcement Learning
arXiv ID: 2605.22454
Announce type: cross
Proposes Qreg+NWLU method
Addresses catastrophic forgetting in CRL
Focuses on value function approximation via data rehearsal
Uses Deep Q-Networks with Q-value regularization
Introduces continuous data rehearsal and No-Wait regularization
Targets multi-cyclic environments with repeating task sequences

Qreg+NWLU: Novel Data Rehearsal Method for Continual Reinforcement Learning

Key facts

Entities

Institutions

Sources