SimuWoB: Synthetic Benchmark for Mobile GUI Agents

ai-technology · 2026-05-26

SimuWoB serves as a comprehensive synthetic benchmark specifically designed for mobile GUI agents, filling the void between current benchmarks and practical applications. It features 120 demanding tasks that vary in type and complexity, all created through a powerful virtual environment framework that autonomously delivers valid rewards. Each environment is made available as a backend-free webpage, accessible through a URL, which facilitates the effective assessment of agents powered by large language models during intricate, long-term interactions.

Key facts

SimuWoB is a fully synthetic benchmark for mobile GUI agents
It includes 120 challenging tasks
Tasks span diverse types and difficulty levels
A robust virtual environment generation framework synthesizes tasks and environments
The framework automatically provides valid rewards for each task
Each environment is deployed as a backend-free webpage accessible via URL
It addresses limitations of existing benchmarks that focus on open-source apps or file-operation tasks
Existing benchmarks have limited coverage of complex, long-horizon interactions

SimuWoB: Synthetic Benchmark for Mobile GUI Agents

Key facts

Entities

Institutions

Sources