ARTFEED — Contemporary Art Intelligence

COMPASS: VLM-Based Framework for Multi-Agent Coordination

ai-technology · 2026-05-07

COMPASS is a framework designed for multiple agents that incorporates Vision-Language Models (VLMs) to facilitate decentralized, closed-loop decision-making within cooperative multi-agent reinforcement learning (MARL). It tackles challenges related to sample efficiency, interpretability, and generalization by dynamically creating and enhancing interpretable, code-based strategies that are kept in a skill library developed from expert demonstrations. The framework utilizes a structured multi-hop communication protocol to share entity information, allowing teams to form a cohesive understanding from incomplete observations. When tested on the SMACv2 benchmark, COMPASS shows considerable advancements compared to current methodologies.

Key facts

  • COMPASS integrates Vision-Language Models (VLMs) for decentralized, closed-loop decision-making.
  • It dynamically generates and refines interpretable, code-based strategies.
  • Strategies are stored in a skill library bootstrapped from expert demonstrations.
  • A structured multi-hop communication protocol propagates entity information.
  • The framework is evaluated on the SMACv2 benchmark.
  • COMPASS addresses sample efficiency, interpretability, and generalization in MARL.
  • It overcomes limitations of text-only LLMs and non-Markovian, partially observable tasks.
  • The framework enables teams to build coherent understanding from partial observations.

Entities

Institutions

  • arXiv

Sources