ARTFEED — Contemporary Art Intelligence

LLM Planning Agents: How Much Competence Comes from the Harness?

ai-technology · 2026-04-30

A new study from arXiv (2604.07236) investigates how much of an AI agent's performance is attributable to the planning harness versus the underlying language model. The researchers externalized a planning harness for the game Collaborative Battleship into four layers: posterior belief tracking, declarative planning, symbolic reflection, and an LLM-backed revision gate. Across 54 games, they measured win rate and F1 score, defining 'heavy lifting' as the largest positive marginal contribution to win rate. Declarative planning alone provided a +24.1 percentage point increase in win rate over a belief-only harness, requiring zero LLM calls. The findings suggest that the harness itself carries significant competence, raising questions about the residual role of the LLM in planning agents.

Key facts

  • arXiv:2604.07236
  • Agent harnesses can change end-to-end performance by as much as six times on a fixed model
  • Planning harness for Collaborative Battleship externalized into four layers
  • Declarative planning provided +24.1 pp win rate over belief-only harness
  • Zero LLM calls needed for declarative planning layer
  • 54 games were played
  • Primary metric: win rate; secondary: F1
  • Heavy lifting defined as largest positive marginal to primary metric

Entities

Institutions

  • arXiv

Sources