In-Context Prompting Outperforms Agent Orchestration for Procedural Tasks

ai-technology · 2026-05-01

A new study shared on arXiv (2604.27891) shows that using in-context prompting—where the whole process is included in the system prompt—outperforms external agent orchestration for procedural tasks. Researchers compared LangGraph, CrewAI, Google ADK, and OpenAI Agents SDK to a self-managing LLM in three categories: travel booking (14 nodes), Zoom support (14 nodes), and processing insurance claims (55 nodes). When assessed by LLM-as-judge on five quality criteria, the in-context approach scored between 4.53 and 5.00, while LangGraph's scores fell between 4.17 and 4.84. The orchestrated system failed in 24% of travel tasks, 9% of Zoom tasks, and 17% of insurance, whereas the in-context method had failure rates of 11.5%, 0.5%, and 5%.

Key facts

In-context prompting outperforms agent orchestration for procedural tasks.
Study compared LangGraph, CrewAI, Google ADK, and OpenAI Agents SDK.
Domains tested: travel booking (14 nodes), Zoom support (14 nodes), insurance (55 nodes).
In-context scored 4.53–5.00 vs LangGraph's 4.17–4.84 on 5-point scale.
Orchestrated system failure rates: 24% travel, 9% Zoom, 17% insurance.
In-context failure rates: 11.5% travel, 0.5% Zoom, 5% insurance.
Research published on arXiv with ID 2604.27891.
LLM-as-judge scoring used for evaluation.

Entities

Institutions

arXiv
LangGraph
CrewAI
Google ADK
OpenAI Agents SDK

Sources

arXiv cs.AI — 2026-05-01