ARTFEED — Contemporary Art Intelligence

Cattle Trade Benchmark Tests LLM Strategic Reasoning in Multi-Agent Games

ai-technology · 2026-05-16

Cattle Trade, a novel multi-agent benchmark, assesses large language models (LLMs) on their strategic reasoning capabilities amidst imperfect information, adversarial interactions, and limited resources. This benchmark uniquely integrates auctions, hidden-offer trading challenges, bargaining, bluffing, opponent modeling, and resource allocation into a single extended game spanning 50–60 turns. Unlike previous benchmarks that evaluated these skills separately, Cattle Trade examines how agents utilize them within a competitive economic context characterized by conflicting incentives. It meticulously records every bid, trade offer, counteroffer, and card selection for in-depth behavioral analysis beyond mere win rates. Researchers tested seven cost-effective language models and three deterministic code agents over 242 games, focusing on strategic coherence, spending efficiency, resource discipline, and phase-adaptive behavior. The paper can be found on arXiv (2605.14537).

Key facts

  • Cattle Trade is a multi-agent benchmark for LLMs
  • It tests strategic reasoning under imperfect information
  • Combines auctions, trade challenges, bargaining, bluffing, opponent modeling, and resource allocation
  • Game lasts 50–60 turns
  • Logs every bid, offer, counteroffer, and card selection
  • Evaluated seven language models and three code agents across 242 games
  • Metrics include strategic coherence, spending efficiency, resource discipline, phase-adaptive behavior
  • Paper available on arXiv (2605.14537)

Entities

Institutions

  • arXiv

Sources