ARTFEED — Contemporary Art Intelligence

LLM Web Agents Fail at Low-Level Execution, Not High-Level Planning

ai-technology · 2026-04-30

A recent investigation published on arXiv (2603.14248) indicates that web agents utilizing large language models (LLMs) face greater challenges with low-level execution compared to high-level reasoning. The authors introduce a hierarchical planning framework that assesses agents through three distinct layers: high-level planning, low-level execution, and replanning. Findings reveal that structured Planning Domain Definition Language (PDDL) plans are more concise and goal-oriented than natural language (NL) plans. Nevertheless, the primary obstacle is low-level execution, highlighting the necessity for enhancements in perceptual grounding and adaptive control to reach human-level reliability. The research advocates for a process-based evaluation approach instead of relying solely on end-to-end success metrics.

Key facts

  • Study analyzes LLM web agents using a hierarchical planning framework
  • Three layers examined: high-level planning, low-level execution, replanning
  • PDDL plans produce more concise and goal-directed strategies than NL plans
  • Low-level execution is the dominant bottleneck
  • Improving perceptual grounding and adaptive control is critical
  • Existing evaluations focus on end-to-end success, offering limited insight
  • Research published on arXiv with ID 2603.14248
  • Study provides a principled foundation for diagnosing agent failures

Entities

Institutions

  • arXiv

Sources