DocOS Benchmark Tests GUI Agents with Document-Guided Tasks

ai-technology · 2026-05-20

Researchers introduced DocOS, a benchmark for evaluating GUI agents' ability to proactively search documentation to solve long-tailed tasks in dynamic web environments. Current GUI agents rely on static parametric knowledge, limiting their handling of tasks requiring explicit procedural knowledge. DocOS requires agents to autonomously navigate web interfaces and use documents to guide actions, mirroring human problem-solving. The benchmark was announced in a paper on arXiv (2605.18048).

Key facts

DocOS benchmark assesses document-guided problem solving in GUI agents
Current GUI agents depend on static parametric knowledge from pre-training
DocOS requires agents to autonomously search for relevant documentation
The paradigm is called Proactive Document-Guided Action
The benchmark operates in fully interactive, dynamic open-web environments
The paper is available on arXiv with ID 2605.18048
The approach mirrors human problem-solving by using documentation
The work addresses long-tailed tasks absent from model parameters

DocOS Benchmark Tests GUI Agents with Document-Guided Tasks

Key facts

Entities

Institutions

Sources