Regularized Centered Emphatic Temporal Difference Learning

other · 2026-05-07

A novel reinforcement learning method, Regularized Emphatic Temporal-Difference Learning (RETD), has been introduced to tackle the balance between stability, projection geometry, and variance management in off-policy TD learning that utilizes function approximation. While Emphatic TD (ETD) enhances projection geometry through follow-on emphasis, it is plagued by significant variance. The researchers address this challenge using Bellman-error centering, revealing that a straightforward centered emphatic extension can create an auxiliary coupling that undermines the positive-definiteness of the ETD key matrix. RETD maintains the follow-on trace and only regularizes the auxiliary centering recursion, raising the lower-right block of the coupled key matrix from 1 to 1+c. The derivation of the RETD core matrix and its convergence under a conservative sufficient regularization condition are established. This study can be found on arXiv with reference 2605.04100.

Key facts

Algorithm: Regularized Emphatic Temporal-Difference Learning (RETD)
Addresses off-policy TD learning with function approximation
Improves upon Emphatic TD (ETD) by reducing variance
Uses Bellman-error centering with regularization
Regularization lifts the lower-right block of the key matrix from 1 to 1+c
Convergence proven under conservative sufficient condition
Paper available on arXiv:2605.04100

Entities

—

Sources

arXiv cs.AI — 2026-05-07