AdaGamma: State-Dependent Discounting for Temporal Adaptation in Reinforcement Learning

ai-technology · 2026-05-09

AdaGamma is a deep actor-critic method for state-dependent discounting in reinforcement learning. It learns a state-dependent discount function alongside a return-consistency objective to regularize the backup structure, preventing instability and TD-error collapse. The method integrates into both SAC and PPO, showing consistent improvements on continuous-control benchmarks and statistically significant gains in an online A/B test. Theoretical analysis establishes well-posedness properties of the induced Bellman operator under suitable conditions.

Key facts

AdaGamma is a deep actor-critic method for state-dependent discounting.
It learns a state-dependent discount function with a return-consistency objective.
The method prevents instability and TD-error collapse.
Integrates into both SAC and PPO.
Shows consistent improvements on continuous-control benchmarks.
Achieves statistically significant gains in an online A/B test.
Theoretical analysis establishes well-posedness of the Bellman operator.
Published on arXiv with ID 2605.06149.

AdaGamma: State-Dependent Discounting for Temporal Adaptation in Reinforcement Learning

Key facts

Entities

Institutions

Sources