AsyncTool Benchmark Evaluates LLM Agents on Asynchronous Tool Calling
Researchers have introduced AsyncTool, a benchmark designed to evaluate the asynchronous function calling capabilities of large language model (LLM)-based agents in multi-task environments. Existing evaluations typically overlook tool response latency and are limited to single-task settings. AsyncTool addresses this by simulating realistic tool feedback delays while presenting multiple heterogeneous tasks concurrently. The benchmark uses a hybrid data evolution strategy to construct a diverse dataset covering multiple scenarios. The work is detailed in arXiv preprint 2605.27995.
Key facts
- AsyncTool is a benchmark for evaluating asynchronous tool calling in LLM agents.
- It addresses the temporal dimension of tool use, specifically response latency.
- The benchmark simulates multi-task environments with delayed tool feedback.
- A hybrid data evolution strategy was used to create the dataset.
- The research is published on arXiv with ID 2605.27995.
- Existing evaluations often ignore tool response latency.
- AsyncTool presents multiple heterogeneous tasks simultaneously.
- The work focuses on real-world concurrent task execution.
Entities
Institutions
- arXiv