Differential Privacy Guarantees for RL with General Function Approximation

other · 2026-05-11

A newly proposed theoretical framework introduces the initial differential privacy assurances for online reinforcement learning utilizing general function approximation, surpassing earlier tabular and linear frameworks. This method integrates batched policy updates with the exponential mechanism alongside an innovative regret analysis, achieving a regret scaling of Õ(K^{3/5}) in a model-free context, which aligns with the leading bounds in the linear scenario. Furthermore, this research presents the first regret bound for online RL employing batch updates that is influenced by the coverability complexity measure, complementing findings based on the Eluder-Condition class. The authors also highlight significant deficiencies in recent findings regarding private RL with linear function approximation.

Key facts

First theoretical guarantees for differentially private online RL with general function approximation
Combines batched policy update scheme with exponential mechanism
Regret scales as Õ(K^{3/5}) in model-free setting under differential privacy
Matches state of the art for linear case
First regret bound for online RL with batch update depending on coverability
Uncovers gaps in recent results for private RL with linear function approximation
Extends beyond tabular and linear settings
Published on arXiv with ID 2605.07049

Differential Privacy Guarantees for RL with General Function Approximation

Key facts

Entities

Institutions

Sources