SkillFlow Benchmark Tests Autonomous Agents' Lifelong Skill Discovery and Evolution

ai-technology · 2026-04-22

A novel benchmark named SkillFlow has been launched to assess the capacity of autonomous agents to discover, repair, and sustain skills over time. In contrast to current benchmarks that evaluate the application of given skills, SkillFlow focuses on agents' learning from experiences to create cohesive skill libraries. It consists of 166 tasks divided into 20 families, each adhering to a Domain-Agnostic Execution Flow (DAEF) that establishes a uniform workflow framework. Agents are tested using an Agentic Lifelong Learning protocol, beginning with no skills and tackling tasks sequentially within each family. They externalize insights through trajectory- and rubric-driven skill patches, enhancing their skill library. Experiments indicate a significant performance gap in existing autonomous agents. The research paper can be found on arXiv with the identifier 2604.17308v1, classified as a new announcement. This initiative tackles the advancing capability frontier of autonomous agents, which are increasingly proficient in executing specialized tasks via plug-and-play external skills. The benchmark's structure enables a consistent workflow across tasks, promoting thorough evaluation of agents' lifelong learning capabilities.

Key facts

SkillFlow is a new benchmark for autonomous agents
It evaluates lifelong skill discovery, repair, and maintenance
The benchmark includes 166 tasks across 20 families
Tasks follow a Domain-Agnostic Execution Flow (DAEF)
Agents are evaluated under an Agentic Lifelong Learning protocol
Experiments reveal a substantial capability gap
The research paper is arXiv:2604.17308v1
Current benchmarks mostly test whether models can use provided skills

SkillFlow Benchmark Tests Autonomous Agents' Lifelong Skill Discovery and Evolution

Key facts

Entities

Institutions

Sources