R2R2: Redundancy Reduction for Robust SPL in RL

other · 2026-05-16

A new regularization method called R2R2 (Robust Representation via Redundancy Reduction) addresses overfitting in Self-Predictive Learning (SPL) for reinforcement learning under high Update-to-Data (UTD) ratios. The method is theoretically grounded, correcting a conflict between standard zero-centering and SPL's spectral properties via a non-centered objective. R2R2 is verified on SPL-native algorithms like TD7 and extended to SimbaV2, creating SimbaV2-SPL. Experiments across 11 continuous control tasks show R2R2 effectively mitigates overfitting, particularly at high UTD rates.

Key facts

R2R2 is a regularization method for Self-Predictive Learning (SPL) in reinforcement learning.
It targets representation-level instability under high Update-to-Data (UTD) regimes.
Standard zero-centering conflicts with SPL's spectral properties; R2R2 uses a non-centered objective.
R2R2 is verified on SPL-native algorithms like TD7.
R2R2 is extended to SimbaV2, creating SimbaV2-SPL.
Experiments across 11 continuous control tasks confirm mitigation of overfitting.
High UTD ratios induce overfitting in data-scarce domains like real-world robotics.
The work is published on arXiv with ID 2605.14026.

R2R2: Redundancy Reduction for Robust SPL in RL

Key facts

Entities

Institutions

Sources