Agent-desktop: CLI for structured desktop automation via OS accessibility APIs

other · 2026-05-02

Agent-desktop is a cross-platform command-line tool that enables AI agents to automate desktop tasks by directly accessing structured UI information through operating system accessibility APIs, rather than relying on screenshot-based pixel prediction. The tool supports macOS (Accessibility API), Windows (UI Automation), and Linux (AT-SPI). It was quietly launched about a month ago and has 122 stars on GitHub. The developer argues that screenshot-based methods used by tools like Codex, Claude Code, and CUA are slow, token-expensive, and fragile, as they break when UI shifts by a few pixels. By contrast, agent-desktop leverages the same structured data that screen readers have used for years, analogous to how Playwright improved web automation over screenshot scraping.

Key facts

Agent-desktop is a cross-platform CLI for structured desktop automation.
It uses OS accessibility APIs: macOS Accessibility API, Windows UI Automation, Linux AT-SPI.
Launched about a month ago, currently 122 stars on GitHub.
Contrasts with screenshot-based agents like Codex, Claude Code, CUA.
Screenshot methods are described as slow, expensive in tokens, and fragile.
Structured access is compared to Playwright's advantage over screenshot scraping on the web.
Tool is available at https://github.com/lahfir/agent-desktop.
Developer has been building computer-use tools for a while.

Agent-desktop: CLI for structured desktop automation via OS accessibility APIs

Key facts

Entities

Institutions

Sources