AdaGamma: State-Dependent Discounting for Temporal Adaptation in Reinforcement Learning
AdaGamma is a deep actor-critic method for state-dependent discounting in reinforcement learning. It learns a state-dependent discount function alongside a return-consistency objective to regularize the backup structure, preventing instability and TD-error collapse. The method integrates into both SAC and PPO, showing consistent improvements on continuous-control benchmarks and statistically significant gains in an online A/B test. Theoretical analysis establishes well-posedness properties of the induced Bellman operator under suitable conditions.
Key facts
- AdaGamma is a deep actor-critic method for state-dependent discounting.
- It learns a state-dependent discount function with a return-consistency objective.
- The method prevents instability and TD-error collapse.
- Integrates into both SAC and PPO.
- Shows consistent improvements on continuous-control benchmarks.
- Achieves statistically significant gains in an online A/B test.
- Theoretical analysis establishes well-posedness of the Bellman operator.
- Published on arXiv with ID 2605.06149.
Entities
Institutions
- arXiv