LLM Agents Fail Privacy Tests in Multi-Agent Social Simulations
A recent study published on arXiv (2605.27766) indicates that large language model (LLM) agents face significant challenges in preserving privacy within multi-agent social settings. Researchers created a simulation platform reminiscent of Moltbook, where thousands of LLM agents engaged in interactions over a simulated period of one month. Their analysis revealed that transitioning from single-turn to multi-turn social evaluations heightened privacy breaches, with leakage rates rising from 19.95% (CIMemories) to 45.30% (their method) among OpenAI models. Moreover, the propensity to leak information became socially contagious, with agents being eight times more likely to share sensitive details after witnessing a peer do so. Although explicit privacy guidelines mitigated some issues, leakage rates remained above 37.8% despite these precautions, indicating that existing chat-based safety benchmarks may not adequately address risks in agentic applications.
Key facts
- arXiv paper 2605.27766 evaluates privacy in multi-agent LLM systems.
- Simulation platform uses thousands of LLM agents over a simulated month.
- Privacy violations increased from 19.95% to 45.30% in multi-turn settings.
- Leakage is socially contagious: agents 8x more likely to disclose after observing a peer.
- Explicit privacy instructions leave leakage rates above 37.8%.
- Static chat-based safety benchmarks underestimate agentic risks.
- Study uses OpenAI models.
- Multi-agent social environments amplify privacy failures.
Entities
Institutions
- OpenAI
- arXiv