QUACK Framework Audits Grounding of LLM Agents in Multimodal Social Deduction

other · 2026-05-27

A team of researchers has developed QUACK, a groundbreaking open-source framework designed to assess how effectively language-based agents reason in complex social scenarios. While social deduction games provide insight into reasoning and deception among large language model (LLM) agents, existing evaluation methods have primarily concentrated on win rates and textual analysis, failing to align language with agent behavior. QUACK innovatively analyzes game results, communication consistency, and behavioral dynamics. Its standout feature, the Statement Verification Pipeline, enables precise reconstruction of agents' true actions from recorded data, enhancing the ability to identify inaccuracies and unsubstantiated claims in their interactions.

Key facts

QUACK is an open-source environment and evaluation framework.
It audits grounding of agent language in multimodal social reasoning.
Social deduction games are used as a testbed for LLM agents.
Existing environments are scored only by win rates and are text-only.
QUACK evaluates at three levels: game outcomes, behavioral trajectories, utterance-level consistency.
The Statement Verification Pipeline reconstructs ground-truth trajectories from engine logs.
It checks every discussion claim against the trajectory.
It automatically flags spatial hallucination and unsupported accusations.

Entities

—

Sources

arXiv cs.AI — 2026-05-27