cotomi Act: AI Browser Agent Learns by Watching Users
A new AI agent named cotomi Act has been unveiled by researchers, functioning within a browser to assimilate organizational knowledge through passive observation of user actions. This innovative system integrates multi-step task performance with ongoing knowledge abstraction. In evaluations using the 179-task WebArena subset, cotomi Act attained an impressive 80.4%, exceeding the previously established human baseline of 78.2%. Key features include adaptive lazy observation, verbal-diff-based history compression, and coarse-grained actions, along with test-time scaling through best-of-N action selection. The behavior-to-knowledge pipeline gradually transforms browsing activities into editable artifacts such as task boards and wikis. Controlled proxy evaluations indicate that task success rates rise as knowledge derived from behavior increases. The system was showcased in a live demonstration.
Key facts
- cotomi Act is a browser-based computer-using agent
- It learns organizational knowledge by watching user behavior
- Achieves 80.4% on WebArena human-evaluation subset (179 tasks)
- Exceeds the reported 78.2% human baseline
- Uses adaptive lazy observation, verbal-diff-based history compression, coarse-grained actions, and best-of-N action selection
- Behavior-to-knowledge pipeline abstracts browsing into task boards and wiki
- Shared workspace is editable by both user and agent
- Task success improves with accumulated behavior-derived knowledge
Entities
Institutions
- arXiv