ARTFEED — Contemporary Art Intelligence

LLMs Struggle with Deception in Secret Hitler Game

ai-technology · 2026-05-25

A recent study published on arXiv presents a new open-source framework aimed at assessing the ability of Large Language Models (LLMs) to mislead players in the board game Secret Hitler. Researchers developed novel metrics like Role Identification Accuracy and Deception Retention Rate to measure LLM performance. They found that LLMs lagged behind both human players and rule-based systems in conversational effectiveness compared to strategic capabilities. Despite using Chain-of-Thought prompting, LLMs struggled, with fascist characters winning 23.2% less frequently. In contrast, rule-based agents achieved 86.7% alignment with expert human voting, revealing significant challenges in LLM deception evaluation.

Key facts

  • Study evaluates LLMs in social deduction game Secret Hitler
  • Open-source framework introduced
  • Novel metrics: Role Identification Accuracy, Deception Retention Rate, Game State Impact Rate
  • Gap identified between conversational ability and strategic depth
  • Chain-of-Thought prompting and internal memory did not improve performance
  • Fascist roles had up to 23.2% worse win rates with reasoning enhancements
  • Rule-based agents align with expert human voting 86.7% of the time
  • Llama 3.1 70B model tested

Entities

Institutions

  • arXiv

Sources