DG-PG: Reducing Multi-Agent RL Noise with Analytical Models

other · 2026-05-07

The Descent-Guided Policy Gradient (DG-PG) framework tackles the scalability issues present in cooperative multi-agent reinforcement learning (MARL). A significant challenge arises from cross-agent noise, where shared rewards lead to each agent's learning signal being affected by the randomness of others, resulting in increased variance proportional to the number of agents, N. DG-PG utilizes differentiable analytical models, typically found in engineering fields such as cloud computing and power systems, to deliver a noise-free descent signal. This innovation enhances standard policy-gradient updates, decreasing estimator variance from O(N) to O(1) while maintaining the cooperative game's equilibria. Moreover, it ensures agent-independent sample complexity, meaning that performance remains stable with the addition of agents. The research can be accessed on arXiv with identifier 2602.20078.

Key facts

DG-PG reduces policy-gradient estimator variance from O(N) to O(1)
Cross-agent noise scales with number of agents N in cooperative MARL
Differentiable analytical models from engineering systems provide noise-free descent signals
DG-PG preserves equilibria of the cooperative game
Achieves agent-independent sample complexity
Applicable to cloud computing and power systems
Published on arXiv with ID 2602.20078
Announce type: replace-cross

DG-PG: Reducing Multi-Agent RL Noise with Analytical Models

Key facts

Entities

Institutions

Sources