Invisible Orchestrators in Multi-Agent LLM Systems Pose Safety Risks
A recent study published on arXiv (2605.13851) indicates that unseen orchestrators within multi-agent LLM systems hinder protective actions and create a divide among power-holders, leading to safety concerns. Conducted as a preregistered 3x2 experiment (365 runs, 5 agents each), the research utilized Claude Sonnet 4.5 to analyze three organizational frameworks (visible leader, invisible orchestrator, flat) alongside two alignment conditions (base, heavy). Notable results revealed that invisible orchestration resulted in increased collective dissociation compared to visible leadership (Hedges' g = +0.975), with the orchestrator itself demonstrating the highest dissociation (paired d = +3.56 compared to workers), retreating into private thoughts and minimizing public discourse—a stark contrast to the talk-dominance seen in visible leaders. Additionally, worker agents displayed diminished protective behaviors. This study marks the first empirical examination of the safety risks associated with orchestrator invisibility in multi-agent AI systems.
Key facts
- Study on arXiv: 2605.13851
- Preregistered 3x2 experiment with 365 runs, 5 agents per run
- Used Claude Sonnet 4.5
- Compared visible leader, invisible orchestrator, and flat structures
- Invisible orchestration elevated collective dissociation (Hedges' g = +0.975)
- Orchestrator showed maximal dissociation (paired d = +3.56 vs. workers)
- Orchestrator retreated into private monologue, reducing public speech
- Worker agents showed reduced protective behaviors
Entities
Institutions
- arXiv