Cattle Trade Benchmark Tests LLM Strategic Reasoning in Multi-Agent Games

ai-technology · 2026-05-16

Cattle Trade, a novel multi-agent benchmark, assesses large language models (LLMs) on their strategic reasoning capabilities amidst imperfect information, adversarial interactions, and limited resources. This benchmark uniquely integrates auctions, hidden-offer trading challenges, bargaining, bluffing, opponent modeling, and resource allocation into a single extended game spanning 50–60 turns. Unlike previous benchmarks that evaluated these skills separately, Cattle Trade examines how agents utilize them within a competitive economic context characterized by conflicting incentives. It meticulously records every bid, trade offer, counteroffer, and card selection for in-depth behavioral analysis beyond mere win rates. Researchers tested seven cost-effective language models and three deterministic code agents over 242 games, focusing on strategic coherence, spending efficiency, resource discipline, and phase-adaptive behavior. The paper can be found on arXiv (2605.14537).

Key facts

Cattle Trade is a multi-agent benchmark for LLMs
It tests strategic reasoning under imperfect information
Combines auctions, trade challenges, bargaining, bluffing, opponent modeling, and resource allocation
Game lasts 50–60 turns
Logs every bid, offer, counteroffer, and card selection
Evaluated seven language models and three code agents across 242 games
Metrics include strategic coherence, spending efficiency, resource discipline, phase-adaptive behavior
Paper available on arXiv (2605.14537)

Cattle Trade Benchmark Tests LLM Strategic Reasoning in Multi-Agent Games

Key facts

Entities

Institutions

Sources