SkillOS: Reinforcement Learning for Self-Evolving AI Agents
Researchers propose SkillOS, a reinforcement learning framework enabling LLM-based agents to autonomously curate reusable skills from past interactions. Current agents fail to learn from experience, relying on manual or heuristic skill curation. SkillOS pairs a frozen executor with a trainable curator that updates an external SkillRepo using composite rewards from delayed feedback. The approach addresses long-term skill curation policies, a key bottleneck in self-evolving agents. The paper is available on arXiv under ID 2605.06614.
Key facts
- SkillOS is an RL training recipe for learning skill curation in self-evolving agents.
- It pairs a frozen agent executor with a trainable skill curator.
- The curator updates an external SkillRepo from accumulated experience.
- Composite rewards provide learning signals for curation.
- Existing approaches rely on manual curation or heuristic operations.
- SkillOS tackles complex long-term curation policies from indirect feedback.
- The paper is published on arXiv with ID 2605.06614.
- LLM-based agents currently fail to learn from past interactions.
Entities
Institutions
- arXiv