Cattle Trade Benchmark Tests LLM Strategic Reasoning in Multi-Agent Games
Cattle Trade, a novel multi-agent benchmark, assesses large language models (LLMs) on their strategic reasoning capabilities amidst imperfect information, adversarial interactions, and limited resources. This benchmark uniquely integrates auctions, hidden-offer trading challenges, bargaining, bluffing, opponent modeling, and resource allocation into a single extended game spanning 50–60 turns. Unlike previous benchmarks that evaluated these skills separately, Cattle Trade examines how agents utilize them within a competitive economic context characterized by conflicting incentives. It meticulously records every bid, trade offer, counteroffer, and card selection for in-depth behavioral analysis beyond mere win rates. Researchers tested seven cost-effective language models and three deterministic code agents over 242 games, focusing on strategic coherence, spending efficiency, resource discipline, and phase-adaptive behavior. The paper can be found on arXiv (2605.14537).
Key facts
- Cattle Trade is a multi-agent benchmark for LLMs
- It tests strategic reasoning under imperfect information
- Combines auctions, trade challenges, bargaining, bluffing, opponent modeling, and resource allocation
- Game lasts 50–60 turns
- Logs every bid, offer, counteroffer, and card selection
- Evaluated seven language models and three code agents across 242 games
- Metrics include strategic coherence, spending efficiency, resource discipline, phase-adaptive behavior
- Paper available on arXiv (2605.14537)
Entities
Institutions
- arXiv