SkillRet Benchmark for LLM Agent Skill Retrieval

other · 2026-05-09

A new benchmark called SkillRet has been unveiled by researchers for skill retrieval in LLM agents. This extensive benchmark features 17,810 publicly available agent skills, categorized using structured semantic tags and a two-tier taxonomy that includes 6 primary categories and 18 sub-categories. It offers 63,259 training samples alongside 4,997 evaluation queries, which are divided into separate skill pools, facilitating both benchmarking and training focused on retrieval. SkillRet tackles the often-overlooked issue of choosing the appropriate skill from vast libraries while adhering to strict context and latency constraints.

Key facts

SkillRet is a large-scale benchmark for skill retrieval in LLM agents.
Contains 17,810 public agent skills.
Skills organized with structured semantic tags and a two-level taxonomy.
Taxonomy covers 6 major categories and 18 sub-categories.
Provides 63,259 training samples.
Provides 4,997 evaluation queries with disjoint skill pools.
Enables benchmarking and retrieval-oriented training.
Addresses the challenge of skill selection in large libraries.

Entities

—

Sources

arXiv cs.AI — 2026-05-09