Skill1: Unified Skill Evolution for Language Agents via RL
A team of researchers has introduced Skill1, a framework designed to train a unified reinforcement learning policy that simultaneously evolves skill selection, application, and distillation for language model agents. This policy creates a query to explore the skill library, re-evaluates potential candidates, tackles tasks based on the chosen skill, and extracts new skills from trajectories, all driven by a single task-outcome signal. Selection benefits from low-frequency trends, while high-frequency variations enhance distillation. Tests conducted on ALFWorld and WebShop demonstrate that Skill1 surpasses previous approaches. This research tackles the issue of sustaining a consistent skill library for reusable strategies across various tasks.
Key facts
- Skill1 is a framework for unified evolution of skill-augmented agents.
- It uses a single reinforcement learning policy for skill selection, utilization, and distillation.
- The policy generates a query, re-ranks candidates, solves tasks, and distills new skills.
- All learning derives from a single task-outcome signal.
- Low-frequency trend credits selection; high-frequency variation credits distillation.
- Experiments conducted on ALFWorld and WebShop benchmarks.
- Skill1 outperforms existing methods that optimize capabilities in isolation.
- The work appears on arXiv with ID 2605.06130.
Entities
Institutions
- arXiv