ARTFEED — Contemporary Art Intelligence

Corpus2Skill: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG

ai-technology · 2026-04-30

A novel technique known as Corpus2Skill converts document collections into structured skill directories for LLM agents, facilitating exploration instead of mere retrieval. This method overcomes the shortcomings of Retrieval-Augmented Generation (RAG) by providing agents with an overarching view of the corpus, enabling them to delve into specific topics, backtrack from less fruitful avenues, and synthesize information across different branches. The compilation process systematically groups documents, produces LLM-generated summaries at various levels, and constructs a navigable skill file tree. During operation, the agent leverages this hierarchy to determine where to search and access complete documents by their IDs. The approach was tested on WixQA, an enterprise QA dataset, demonstrating enhancements compared to traditional RAG.

Key facts

  • Corpus2Skill distills document corpora into hierarchical skill directories offline.
  • The method treats LLM agents as active navigators rather than passive consumers of search results.
  • The compilation pipeline iteratively clusters documents and generates LLM-written summaries at each level.
  • The result is a tree of navigable skill files.
  • At serve time, the agent receives a bird's-eye view of the corpus.
  • The agent can drill into topic branches via progressively finer summaries.
  • The agent can backtrack from unproductive paths and combine evidence across branches.
  • The method is evaluated on WixQA, an enterprise QA dataset.

Entities

Sources