In-Context Prompting Outperforms Agent Orchestration for Procedural Tasks
A new study shared on arXiv (2604.27891) shows that using in-context prompting—where the whole process is included in the system prompt—outperforms external agent orchestration for procedural tasks. Researchers compared LangGraph, CrewAI, Google ADK, and OpenAI Agents SDK to a self-managing LLM in three categories: travel booking (14 nodes), Zoom support (14 nodes), and processing insurance claims (55 nodes). When assessed by LLM-as-judge on five quality criteria, the in-context approach scored between 4.53 and 5.00, while LangGraph's scores fell between 4.17 and 4.84. The orchestrated system failed in 24% of travel tasks, 9% of Zoom tasks, and 17% of insurance, whereas the in-context method had failure rates of 11.5%, 0.5%, and 5%.
Key facts
- In-context prompting outperforms agent orchestration for procedural tasks.
- Study compared LangGraph, CrewAI, Google ADK, and OpenAI Agents SDK.
- Domains tested: travel booking (14 nodes), Zoom support (14 nodes), insurance (55 nodes).
- In-context scored 4.53–5.00 vs LangGraph's 4.17–4.84 on 5-point scale.
- Orchestrated system failure rates: 24% travel, 9% Zoom, 17% insurance.
- In-context failure rates: 11.5% travel, 0.5% Zoom, 5% insurance.
- Research published on arXiv with ID 2604.27891.
- LLM-as-judge scoring used for evaluation.
Entities
Institutions
- arXiv
- LangGraph
- CrewAI
- Google ADK
- OpenAI Agents SDK