Distributed Consensus Enables Scalable Constrained Multi-Agent RL
A novel distributed method for constrained Multi-Agent Reinforcement Learning (MARL) integrates state-augmented policy learning with neighbor-to-neighbor consensus on dual variables. This approach is designed for systems where agents possess separable dynamics yet need to collaborate to meet global resource constraints. Empirical findings indicate that independent learning does not yield feasible solutions in these scenarios. A significant contribution of this work is the demonstration that lightweight consensus on Lagrange multipliers is adequate for enforcing global constraints while maintaining scalability. Each agent develops a single augmented policy offline, based on local state and a dual variable that encodes constraint feedback. During operation, agents achieve consensus on this dual variable through local communication. The method is tested on a multi-robot resource allocation challenge, showcasing its feasibility and scalability.
Key facts
- Method combines state-augmented policy learning with distributed consensus over dual variables.
- Targets systems with separable dynamics and global resource constraints.
- Independent learning fails to produce feasible solutions in such settings.
- Lightweight neighbor-to-neighbor consensus over Lagrange multipliers suffices for global constraint enforcement.
- Each agent learns a single augmented policy offline.
- Policy is conditioned on local state and a dual variable encoding constraint feedback.
- During execution, agents reach agreement on dual variable through local communication.
- Validated on a multi-robot resource allocation problem.
Entities
—