OpenComputer Framework for Verifiable Software Worlds

other · 2026-05-20

OpenComputer is a framework focused on verification, designed to create verifiable software environments for computer agents. It comprises four key elements: state verifiers tailored to specific applications with structured inspection points for real-world applications, a self-improving verification layer that enhances reliability through feedback from execution, a pipeline for generating tasks that are both realistic and verifiable, and an evaluation system that tracks complete trajectories while calculating auditable partial-credit rewards. Presently, it encompasses 33 desktop applications and 1,000 completed tasks spanning browsers, office software, creative tools, development platforms, file management systems, and communication apps. Research indicates that OpenComputer's fixed verifiers are more in sync with human judgment than evaluations by LLMs, particularly when success hinges on subtle criteria.

Key facts

OpenComputer is a verifier-grounded framework for computer-use agents.
It integrates four components: state verifiers, self-evolving verification, task-generation pipeline, and evaluation harness.
App-specific state verifiers expose structured inspection endpoints over real applications.
The self-evolving verification layer improves verifier reliability using execution-grounded feedback.
The task-generation pipeline synthesizes realistic and machine-checkable desktop tasks.
The evaluation harness records full trajectories and computes auditable partial-credit rewards.
OpenComputer covers 33 desktop applications and 1,000 finalized tasks.
Tasks span browsers, office tools, creative software, development environments, file managers, and communication applications.
Hard-coded verifiers align more closely with human adjudication than LLM-as-judge evaluation.

Entities

—

Sources

arXiv cs.AI — 2026-05-20