LLM Web Agents Fail at Low-Level Execution, Not High-Level Planning
A recent investigation published on arXiv (2603.14248) indicates that web agents utilizing large language models (LLMs) face greater challenges with low-level execution compared to high-level reasoning. The authors introduce a hierarchical planning framework that assesses agents through three distinct layers: high-level planning, low-level execution, and replanning. Findings reveal that structured Planning Domain Definition Language (PDDL) plans are more concise and goal-oriented than natural language (NL) plans. Nevertheless, the primary obstacle is low-level execution, highlighting the necessity for enhancements in perceptual grounding and adaptive control to reach human-level reliability. The research advocates for a process-based evaluation approach instead of relying solely on end-to-end success metrics.
Key facts
- Study analyzes LLM web agents using a hierarchical planning framework
- Three layers examined: high-level planning, low-level execution, replanning
- PDDL plans produce more concise and goal-directed strategies than NL plans
- Low-level execution is the dominant bottleneck
- Improving perceptual grounding and adaptive control is critical
- Existing evaluations focus on end-to-end success, offering limited insight
- Research published on arXiv with ID 2603.14248
- Study provides a principled foundation for diagnosing agent failures
Entities
Institutions
- arXiv