SWARM Framework Uses Soft Labels for Multi-Agent AI Safety

ai-technology · 2026-04-24

A new simulation framework called SWARM (System-Wide Assessment of Risk in Multi-agent systems) replaces binary good/bad labels with soft probabilistic labels to address emergent risks in multi-agent AI systems. Introduced in arXiv:2604.19752, SWARM computes continuous-valued payoffs, toxicity measures, and governance interventions. It includes a modular governance engine with levers such as transaction taxes, circuit breakers, reputation decay, and random audits. Effects are quantified via probabilistic metrics like expected toxicity and quality gap. The framework was tested across seven scenarios.

Key facts

SWARM replaces binary labels with soft probabilistic labels p = P(v=+1) in [0,1]
Framework addresses emergent risks in multi-agent AI systems
Includes modular governance engine with configurable levers
Levers include transaction taxes, circuit breakers, reputation decay, random audits
Quantifies effects using expected toxicity and quality gap metrics
Tested across seven scenarios
Published on arXiv with ID 2604.19752
Announcement type is cross

SWARM Framework Uses Soft Labels for Multi-Agent AI Safety

Key facts

Entities

Institutions

Sources