ToolCUA: Optimal GUI-Tool Path Orchestration for Computer Use Agents
A recent publication on arXiv presents ToolCUA, a comprehensive agent designed to optimize the selection of GUI-Tool paths for Computer Use Agents (CUAs). CUAs utilize both basic GUI actions (like clicking and typing) and advanced tool commands (such as API-driven file operations). However, they often face challenges in determining whether to persist with GUI actions or transition to tools, leading to inefficient execution paths. This issue stems from a lack of quality interleaved GUI-Tool trajectories, the difficulties and fragility associated with gathering real tool trajectories, and insufficient trajectory-level guidance for path selection. ToolCUA features an Interleaved GUI-Tool Trajectory Scaling Pipeline that utilizes plentiful static GUI trajectories and creates a grounded tool library, facilitating varied GUI-Tool trajectories without the need for manual engineering or actual tool-trajectory collection. The paper can be found on arXiv with the identifier 2605.12481.
Key facts
- ToolCUA is an end-to-end agent for Computer Use Agents.
- It learns optimal GUI-Tool path selection.
- CUAs use both atomic GUI actions and high-level tool calls.
- The hybrid action space causes uncertainty in path selection.
- Scarcity of high-quality interleaved trajectories is a challenge.
- The Interleaved GUI-Tool Trajectory Scaling Pipeline repurposes static GUI trajectories.
- It synthesizes a grounded tool library.
- The paper is on arXiv with ID 2605.12481.
Entities
Institutions
- arXiv