ARTFEED — Contemporary Art Intelligence

AdaGamma: State-Dependent Discounting for Temporal Adaptation in Reinforcement Learning

ai-technology · 2026-05-09

AdaGamma is a deep actor-critic method for state-dependent discounting in reinforcement learning. It learns a state-dependent discount function alongside a return-consistency objective to regularize the backup structure, preventing instability and TD-error collapse. The method integrates into both SAC and PPO, showing consistent improvements on continuous-control benchmarks and statistically significant gains in an online A/B test. Theoretical analysis establishes well-posedness properties of the induced Bellman operator under suitable conditions.

Key facts

  • AdaGamma is a deep actor-critic method for state-dependent discounting.
  • It learns a state-dependent discount function with a return-consistency objective.
  • The method prevents instability and TD-error collapse.
  • Integrates into both SAC and PPO.
  • Shows consistent improvements on continuous-control benchmarks.
  • Achieves statistically significant gains in an online A/B test.
  • Theoretical analysis establishes well-posedness of the Bellman operator.
  • Published on arXiv with ID 2605.06149.

Entities

Institutions

  • arXiv

Sources