ARTFEED — Contemporary Art Intelligence

AI Agents as Jurors: Multi-Agent LLM Deliberation Tested on '12 Angry Men'

ai-technology · 2026-05-06

A recent paper on arXiv (2605.01986) employs the storyline from Sidney Lumet's 1957 movie '12 Angry Men' as a standard for assessing multi-agent LLM discussions. In this study, twelve AI agents, each embodying a character from the film, engage in a debate over a murder trial using a multi-agent setup. The research evaluates two models: GPT-4o (closed-source, heavily aligned) and Llama-4-Scout (open-weight, less aligned), under three different scenarios (baseline, open-minded prompt, no initial vote), with three replications for each condition (totaling 18 runs). Findings reveal that 17 out of 18 trials result in a hung jury, highlighting that the expected shift from minority to majority opinion rarely happens, suggesting that anchoring is a significant issue in LLM discussions.

Key facts

  • Paper uses '12 Angry Men' as a multi-agent benchmark for LLM deliberation.
  • Twelve AI agents are conditioned on film-faithful personas.
  • Models tested: GPT-4o and Llama-4-Scout.
  • Three conditions: baseline, open-minded prompt, no initial vote.
  • 18 runs total (N=3 per cell).
  • 17 of 18 runs end in a hung jury.
  • Minority-to-majority persuasion almost never occurs.
  • Anchoring identified as dominant failure mode.

Entities

Artists

  • Sidney Lumet

Institutions

  • arXiv

Sources