Claw-Anything: Benchmarking Always-On AI Assistants
A new benchmark called Claw-Anything has been developed by researchers to assess large language model agents that function as always-on personal assistants with extensive access to a user's digital environment. Existing benchmarks only address limited aspects of user data, which restricts context-aware reasoning. Claw-Anything enhances agent context in three areas: long-term activity logs, interconnected backend services, and seamless GUI and CLI interactions across various devices. To create realistic scenarios, the benchmark employs multi-round event injection to simulate months of user activity, incorporating irrelevant events and conflicting information. Agents are required to reason within complex contextual settings while maintaining resilience to noise. This research is documented in arXiv preprint 2605.26086.
Key facts
- Claw-Anything is a new benchmark for always-on AI assistants.
- It expands agent context across three dimensions: long-horizon activity histories, interdependent backend services, and integrated GUI/CLI interaction.
- The benchmark simulates months of user activity via multi-round event injection.
- It includes realistic noise such as irrelevant events and conflicting signals.
- The research is published on arXiv with ID 2605.26086.
Entities
Institutions
- arXiv