Soft DPG with Gaussian Smoothing for Continuous Control
A new reinforcement learning algorithm, Soft Deep Deterministic Policy Gradient (Soft DDPG), addresses the limitation of standard DPG which requires differentiable critics. Standard DPG fails with sparse or discrete rewards, causing ill-defined gradients. Soft DDPG uses a smoothed Bellman equation via Gaussian smoothing to define a novel action-value function, eliminating reliance on critic action-gradients. This ensures well-defined gradients even for non-smooth Q-functions. The framework is detailed in arXiv:2605.06228.
Key facts
- Standard DPG requires differentiable critics for policy updates.
- This assumption is violated with sparse or discrete rewards.
- Soft-DPG uses a smoothed Bellman equation via Gaussian smoothing.
- It defines a novel action-value function.
- Soft DDPG eliminates explicit dependence on critic action-gradients.
- Gradients remain well-defined for non-smooth Q-functions.
- The algorithm is called soft deep deterministic policy gradient (Soft DDPG).
- The paper is available on arXiv with ID 2605.06228.
Entities
Institutions
- arXiv