Region4Web: Rethinking Observation Space Granularity for Web Agents
A new framework called Region4Web proposes that web agents should perceive pages at the granularity of functional regions rather than individual elements. The approach reorganizes the AXTree into functional regions via hierarchical decomposition and semantic abstraction. A companion inference pipeline, PageDigest, delivers region-level observations as a compact per-page digest that persists across steps. The work addresses the underexamined design choice of observation space granularity in web agents, arguing that element-level observation forces agents to infer functional organization implicitly. The framework is evaluated on the WebArena benchmark.
Key facts
- Region4Web reorganizes the AXTree into functional regions.
- PageDigest is a web-specific inference pipeline for region-level observation.
- Observation granularity is an underexamined design choice in web agents.
- Existing work treats observation at the same element-level granularity as action space.
- Region4Web uses hierarchical decomposition and semantic abstraction.
- PageDigest delivers a compact per-page digest that persists across steps.
- The framework is evaluated on the WebArena benchmark.
- The paper is available on arXiv with ID 2605.07134.
Entities
Institutions
- arXiv