SkillFlow Benchmark Tests Autonomous Agents' Lifelong Skill Discovery and Evolution
A novel benchmark named SkillFlow has been launched to assess the capacity of autonomous agents to discover, repair, and sustain skills over time. In contrast to current benchmarks that evaluate the application of given skills, SkillFlow focuses on agents' learning from experiences to create cohesive skill libraries. It consists of 166 tasks divided into 20 families, each adhering to a Domain-Agnostic Execution Flow (DAEF) that establishes a uniform workflow framework. Agents are tested using an Agentic Lifelong Learning protocol, beginning with no skills and tackling tasks sequentially within each family. They externalize insights through trajectory- and rubric-driven skill patches, enhancing their skill library. Experiments indicate a significant performance gap in existing autonomous agents. The research paper can be found on arXiv with the identifier 2604.17308v1, classified as a new announcement. This initiative tackles the advancing capability frontier of autonomous agents, which are increasingly proficient in executing specialized tasks via plug-and-play external skills. The benchmark's structure enables a consistent workflow across tasks, promoting thorough evaluation of agents' lifelong learning capabilities.
Key facts
- SkillFlow is a new benchmark for autonomous agents
- It evaluates lifelong skill discovery, repair, and maintenance
- The benchmark includes 166 tasks across 20 families
- Tasks follow a Domain-Agnostic Execution Flow (DAEF)
- Agents are evaluated under an Agentic Lifelong Learning protocol
- Experiments reveal a substantial capability gap
- The research paper is arXiv:2604.17308v1
- Current benchmarks mostly test whether models can use provided skills
Entities
Institutions
- arXiv