Compile-and-Execute Architecture Cuts LLM Web Agent Inference Costs by 99.9%
A recent study published on arXiv (2604.09718) presents a Compile-and-Execute framework aimed at tackling the Rerun Crisis associated with LLM-based web automation. This crisis is characterized by the linear increase in token usage and API latency during repeated task executions; for instance, a 5-step workflow repeated 500 times incurs around $150.00 in inference costs, or $15.00 when employing aggressive caching. The innovative approach separates LLM reasoning from browser execution by utilizing a single LLM call on a token-efficient semantic representation generated by a DOM Sanitization Module (DSM), which produces a deterministic JSON workflow blueprint. Subsequently, a lightweight runtime carries out the browser actions without additional model queries, lowering the inference cost per workflow to less than $0.10. The paper elaborates on this cost reduction formalization.
Key facts
- arXiv paper 2604.09718 proposes Compile-and-Execute architecture for LLM web agents
- Rerun Crisis: linear growth of token expenditure and API latency with execution frequency
- 5-step workflow over 500 iterations costs ~$150.00 in inference costs
- Even with aggressive caching, cost remains near $15.00
- Proposed method reduces per-workflow inference cost to under $0.10
- One-shot LLM invocation uses token-efficient semantic representation from DOM Sanitization Module (DSM)
- Output is a deterministic JSON workflow blueprint
- Lightweight runtime drives browser without further model queries
Entities
Institutions
- arXiv