COMPASS: VLM-Based Framework for Multi-Agent Coordination

ai-technology · 2026-05-07

COMPASS is a framework designed for multiple agents that incorporates Vision-Language Models (VLMs) to facilitate decentralized, closed-loop decision-making within cooperative multi-agent reinforcement learning (MARL). It tackles challenges related to sample efficiency, interpretability, and generalization by dynamically creating and enhancing interpretable, code-based strategies that are kept in a skill library developed from expert demonstrations. The framework utilizes a structured multi-hop communication protocol to share entity information, allowing teams to form a cohesive understanding from incomplete observations. When tested on the SMACv2 benchmark, COMPASS shows considerable advancements compared to current methodologies.

Key facts

COMPASS integrates Vision-Language Models (VLMs) for decentralized, closed-loop decision-making.
It dynamically generates and refines interpretable, code-based strategies.
Strategies are stored in a skill library bootstrapped from expert demonstrations.
A structured multi-hop communication protocol propagates entity information.
The framework is evaluated on the SMACv2 benchmark.
COMPASS addresses sample efficiency, interpretability, and generalization in MARL.
It overcomes limitations of text-only LLMs and non-Markovian, partially observable tasks.
The framework enables teams to build coherent understanding from partial observations.

COMPASS: VLM-Based Framework for Multi-Agent Coordination

Key facts

Entities

Institutions

Sources