ARTFEED — Contemporary Art Intelligence

DocOS Benchmark Tests GUI Agents with Document-Guided Tasks

ai-technology · 2026-05-20

Researchers introduced DocOS, a benchmark for evaluating GUI agents' ability to proactively search documentation to solve long-tailed tasks in dynamic web environments. Current GUI agents rely on static parametric knowledge, limiting their handling of tasks requiring explicit procedural knowledge. DocOS requires agents to autonomously navigate web interfaces and use documents to guide actions, mirroring human problem-solving. The benchmark was announced in a paper on arXiv (2605.18048).

Key facts

  • DocOS benchmark assesses document-guided problem solving in GUI agents
  • Current GUI agents depend on static parametric knowledge from pre-training
  • DocOS requires agents to autonomously search for relevant documentation
  • The paradigm is called Proactive Document-Guided Action
  • The benchmark operates in fully interactive, dynamic open-web environments
  • The paper is available on arXiv with ID 2605.18048
  • The approach mirrors human problem-solving by using documentation
  • The work addresses long-tailed tasks absent from model parameters

Entities

Institutions

  • arXiv

Sources