ARTFEED — Contemporary Art Intelligence

SimuWoB: Synthetic Benchmark for Mobile GUI Agents

ai-technology · 2026-05-26

SimuWoB serves as a comprehensive synthetic benchmark specifically designed for mobile GUI agents, filling the void between current benchmarks and practical applications. It features 120 demanding tasks that vary in type and complexity, all created through a powerful virtual environment framework that autonomously delivers valid rewards. Each environment is made available as a backend-free webpage, accessible through a URL, which facilitates the effective assessment of agents powered by large language models during intricate, long-term interactions.

Key facts

  • SimuWoB is a fully synthetic benchmark for mobile GUI agents
  • It includes 120 challenging tasks
  • Tasks span diverse types and difficulty levels
  • A robust virtual environment generation framework synthesizes tasks and environments
  • The framework automatically provides valid rewards for each task
  • Each environment is deployed as a backend-free webpage accessible via URL
  • It addresses limitations of existing benchmarks that focus on open-source apps or file-operation tasks
  • Existing benchmarks have limited coverage of complex, long-horizon interactions

Entities

Institutions

  • arXiv

Sources