Treating Teammates as Learnable Components in MARL World Models

other · 2026-06-01

A novel strategy for cooperative multi-agent reinforcement learning (MARL) suggests viewing teammates as structured, learnable elements within an agent's world model. This architecture, which is based on a Dreamer-style recurrent state-space model (RSSM), separates the latent state into components representing both the environment and teammates. An auxiliary Theory-of-Mind (ToM) head deduces latent representations of partner behaviors—such as character, intent, and anticipated actions—from incomplete trajectories. These teammate latents inform both the actor and critic, allowing the agent to envision and adjust to various collaborators. This approach overcomes the challenges of current world models in managing uncertainty introduced by teammates, paving the way for enhanced generalization and sample efficiency in cooperative MARL. The paper can be found on arXiv with the identifier 2605.31361.

Key facts

Proposes treating teammates as structured, learnable components within an agent's world model.
Architecture factorizes latent state of Dreamer-style RSSM into environment and teammate components.
Learns an auxiliary Theory-of-Mind (ToM) head to infer latent embeddings of partner behavior.
Teammate latents condition the actor and critic.
Enables agent to imagine and adapt to diverse collaborators.
Addresses limitation of world models in handling teammate-induced uncertainty.
Aims to improve generalization and sample efficiency in cooperative MARL.
Paper available on arXiv: 2605.31361.

Treating Teammates as Learnable Components in MARL World Models

Key facts

Entities

Institutions

Sources