Agent-desktop: CLI for structured desktop automation via OS accessibility APIs
Agent-desktop is a cross-platform command-line tool that enables AI agents to automate desktop tasks by directly accessing structured UI information through operating system accessibility APIs, rather than relying on screenshot-based pixel prediction. The tool supports macOS (Accessibility API), Windows (UI Automation), and Linux (AT-SPI). It was quietly launched about a month ago and has 122 stars on GitHub. The developer argues that screenshot-based methods used by tools like Codex, Claude Code, and CUA are slow, token-expensive, and fragile, as they break when UI shifts by a few pixels. By contrast, agent-desktop leverages the same structured data that screen readers have used for years, analogous to how Playwright improved web automation over screenshot scraping.
Key facts
- Agent-desktop is a cross-platform CLI for structured desktop automation.
- It uses OS accessibility APIs: macOS Accessibility API, Windows UI Automation, Linux AT-SPI.
- Launched about a month ago, currently 122 stars on GitHub.
- Contrasts with screenshot-based agents like Codex, Claude Code, CUA.
- Screenshot methods are described as slow, expensive in tokens, and fragile.
- Structured access is compared to Playwright's advantage over screenshot scraping on the web.
- Tool is available at https://github.com/lahfir/agent-desktop.
- Developer has been building computer-use tools for a while.
Entities
Institutions
- GitHub