ARTFEED — Contemporary Art Intelligence

Agent-desktop: CLI for structured desktop automation via OS accessibility APIs

other · 2026-05-02

Agent-desktop is a cross-platform command-line tool that enables AI agents to automate desktop tasks by directly accessing structured UI information through operating system accessibility APIs, rather than relying on screenshot-based pixel prediction. The tool supports macOS (Accessibility API), Windows (UI Automation), and Linux (AT-SPI). It was quietly launched about a month ago and has 122 stars on GitHub. The developer argues that screenshot-based methods used by tools like Codex, Claude Code, and CUA are slow, token-expensive, and fragile, as they break when UI shifts by a few pixels. By contrast, agent-desktop leverages the same structured data that screen readers have used for years, analogous to how Playwright improved web automation over screenshot scraping.

Key facts

  • Agent-desktop is a cross-platform CLI for structured desktop automation.
  • It uses OS accessibility APIs: macOS Accessibility API, Windows UI Automation, Linux AT-SPI.
  • Launched about a month ago, currently 122 stars on GitHub.
  • Contrasts with screenshot-based agents like Codex, Claude Code, CUA.
  • Screenshot methods are described as slow, expensive in tokens, and fragile.
  • Structured access is compared to Playwright's advantage over screenshot scraping on the web.
  • Tool is available at https://github.com/lahfir/agent-desktop.
  • Developer has been building computer-use tools for a while.

Entities

Institutions

  • GitHub

Sources