COSPLAY: Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks
A new framework called COSPLAY enables large language models (LLMs) to improve long-horizon decision making in interactive environments like games. The system pairs an LLM decision agent with a learnable skill bank that stores reusable skills discovered from the agent's own unlabeled rollouts. By co-evolving both components, the decision agent learns better skill retrieval and action selection over time, addressing a key weakness of LLMs in multi-step reasoning under delayed rewards and partial observability. The research is published on arXiv (2604.20987).
Key facts
- COSPLAY is a co-evolution framework for LLM agents in long-horizon tasks.
- It consists of an LLM decision agent and a learnable skill bank.
- Skills are discovered from unlabeled agent rollouts.
- The framework improves skill retrieval and action selection.
- It addresses LLM struggles with consistent long-horizon decision making.
- Games serve as testbeds for evaluating skill usage.
- The paper is on arXiv with ID 2604.20987.
- The approach handles delayed rewards and partial observability.
Entities
Institutions
- arXiv