COMPASS: VLM-Based Framework for Multi-Agent Coordination
COMPASS is a framework designed for multiple agents that incorporates Vision-Language Models (VLMs) to facilitate decentralized, closed-loop decision-making within cooperative multi-agent reinforcement learning (MARL). It tackles challenges related to sample efficiency, interpretability, and generalization by dynamically creating and enhancing interpretable, code-based strategies that are kept in a skill library developed from expert demonstrations. The framework utilizes a structured multi-hop communication protocol to share entity information, allowing teams to form a cohesive understanding from incomplete observations. When tested on the SMACv2 benchmark, COMPASS shows considerable advancements compared to current methodologies.
Key facts
- COMPASS integrates Vision-Language Models (VLMs) for decentralized, closed-loop decision-making.
- It dynamically generates and refines interpretable, code-based strategies.
- Strategies are stored in a skill library bootstrapped from expert demonstrations.
- A structured multi-hop communication protocol propagates entity information.
- The framework is evaluated on the SMACv2 benchmark.
- COMPASS addresses sample efficiency, interpretability, and generalization in MARL.
- It overcomes limitations of text-only LLMs and non-Markovian, partially observable tasks.
- The framework enables teams to build coherent understanding from partial observations.
Entities
Institutions
- arXiv