LLM Web Agents Fail at Low-Level Execution, Not High-Level Planning

ai-technology · 2026-04-30

A recent investigation published on arXiv (2603.14248) indicates that web agents utilizing large language models (LLMs) face greater challenges with low-level execution compared to high-level reasoning. The authors introduce a hierarchical planning framework that assesses agents through three distinct layers: high-level planning, low-level execution, and replanning. Findings reveal that structured Planning Domain Definition Language (PDDL) plans are more concise and goal-oriented than natural language (NL) plans. Nevertheless, the primary obstacle is low-level execution, highlighting the necessity for enhancements in perceptual grounding and adaptive control to reach human-level reliability. The research advocates for a process-based evaluation approach instead of relying solely on end-to-end success metrics.

Key facts

Study analyzes LLM web agents using a hierarchical planning framework
Three layers examined: high-level planning, low-level execution, replanning
PDDL plans produce more concise and goal-directed strategies than NL plans
Low-level execution is the dominant bottleneck
Improving perceptual grounding and adaptive control is critical
Existing evaluations focus on end-to-end success, offering limited insight
Research published on arXiv with ID 2603.14248
Study provides a principled foundation for diagnosing agent failures

LLM Web Agents Fail at Low-Level Execution, Not High-Level Planning

Key facts

Entities

Institutions

Sources