Compile-and-Execute Architecture Cuts LLM Web Agent Inference Costs by 99.9%

ai-technology · 2026-05-01

A recent study published on arXiv (2604.09718) presents a Compile-and-Execute framework aimed at tackling the Rerun Crisis associated with LLM-based web automation. This crisis is characterized by the linear increase in token usage and API latency during repeated task executions; for instance, a 5-step workflow repeated 500 times incurs around $150.00 in inference costs, or $15.00 when employing aggressive caching. The innovative approach separates LLM reasoning from browser execution by utilizing a single LLM call on a token-efficient semantic representation generated by a DOM Sanitization Module (DSM), which produces a deterministic JSON workflow blueprint. Subsequently, a lightweight runtime carries out the browser actions without additional model queries, lowering the inference cost per workflow to less than $0.10. The paper elaborates on this cost reduction formalization.

Key facts

arXiv paper 2604.09718 proposes Compile-and-Execute architecture for LLM web agents
Rerun Crisis: linear growth of token expenditure and API latency with execution frequency
5-step workflow over 500 iterations costs ~$150.00 in inference costs
Even with aggressive caching, cost remains near $15.00
Proposed method reduces per-workflow inference cost to under $0.10
One-shot LLM invocation uses token-efficient semantic representation from DOM Sanitization Module (DSM)
Output is a deterministic JSON workflow blueprint
Lightweight runtime drives browser without further model queries

Compile-and-Execute Architecture Cuts LLM Web Agent Inference Costs by 99.9%

Key facts

Entities

Institutions

Sources