Cross-Timestep Delays in Multi-Agent RL: Communication Gain vs Delay Cost
A recent research paper from arXiv (2604.03785) addresses the issue of communication delays across time steps in cooperative multi-agent reinforcement learning when faced with partial observability. The researchers present the delayed-communication partially observable Markov game (DeComm-POMG) and analyze the impact of a message by separating it into communication gain and delay cost, leading to the development of the CGDC metric. They establish a value-loss bound that indicates the degradation caused by delayed messages is limited by a discounted sum of the information gap between action distributions resulting from timely versus delayed messages. To tackle temporal misalignment and outdated information in multi-agent coordination, they introduce CDCMA, an actor-critic framework that requests messages only when the predicted CGDC is positive and anticipates future observations.
Key facts
- arXiv:2604.03785v2
- Introduces DeComm-POMG formalization
- Decomposes message effect into communication gain and delay cost (CGDC)
- Establishes value-loss bound for delayed messages
- Proposes CDCMA actor-critic framework
- CDCMA requests messages only when predicted CGDC positive
- Addresses cross-timestep delays in cooperative MARL
- Focuses on partial observability settings
Entities
Institutions
- arXiv