DG-PG: Reducing Multi-Agent RL Noise with Analytical Models
The Descent-Guided Policy Gradient (DG-PG) framework tackles the scalability issues present in cooperative multi-agent reinforcement learning (MARL). A significant challenge arises from cross-agent noise, where shared rewards lead to each agent's learning signal being affected by the randomness of others, resulting in increased variance proportional to the number of agents, N. DG-PG utilizes differentiable analytical models, typically found in engineering fields such as cloud computing and power systems, to deliver a noise-free descent signal. This innovation enhances standard policy-gradient updates, decreasing estimator variance from O(N) to O(1) while maintaining the cooperative game's equilibria. Moreover, it ensures agent-independent sample complexity, meaning that performance remains stable with the addition of agents. The research can be accessed on arXiv with identifier 2602.20078.
Key facts
- DG-PG reduces policy-gradient estimator variance from O(N) to O(1)
- Cross-agent noise scales with number of agents N in cooperative MARL
- Differentiable analytical models from engineering systems provide noise-free descent signals
- DG-PG preserves equilibria of the cooperative game
- Achieves agent-independent sample complexity
- Applicable to cloud computing and power systems
- Published on arXiv with ID 2602.20078
- Announce type: replace-cross
Entities
Institutions
- arXiv