Cross-Timestep Delays in Multi-Agent RL: Communication Gain vs Delay Cost

other · 2026-05-27

A recent research paper from arXiv (2604.03785) addresses the issue of communication delays across time steps in cooperative multi-agent reinforcement learning when faced with partial observability. The researchers present the delayed-communication partially observable Markov game (DeComm-POMG) and analyze the impact of a message by separating it into communication gain and delay cost, leading to the development of the CGDC metric. They establish a value-loss bound that indicates the degradation caused by delayed messages is limited by a discounted sum of the information gap between action distributions resulting from timely versus delayed messages. To tackle temporal misalignment and outdated information in multi-agent coordination, they introduce CDCMA, an actor-critic framework that requests messages only when the predicted CGDC is positive and anticipates future observations.

Key facts

arXiv:2604.03785v2
Introduces DeComm-POMG formalization
Decomposes message effect into communication gain and delay cost (CGDC)
Establishes value-loss bound for delayed messages
Proposes CDCMA actor-critic framework
CDCMA requests messages only when predicted CGDC positive
Addresses cross-timestep delays in cooperative MARL
Focuses on partial observability settings

Cross-Timestep Delays in Multi-Agent RL: Communication Gain vs Delay Cost

Key facts

Entities

Institutions

Sources