COSPLAY: Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks

ai-technology · 2026-04-25

A new framework called COSPLAY enables large language models (LLMs) to improve long-horizon decision making in interactive environments like games. The system pairs an LLM decision agent with a learnable skill bank that stores reusable skills discovered from the agent's own unlabeled rollouts. By co-evolving both components, the decision agent learns better skill retrieval and action selection over time, addressing a key weakness of LLMs in multi-step reasoning under delayed rewards and partial observability. The research is published on arXiv (2604.20987).

Key facts

COSPLAY is a co-evolution framework for LLM agents in long-horizon tasks.
It consists of an LLM decision agent and a learnable skill bank.
Skills are discovered from unlabeled agent rollouts.
The framework improves skill retrieval and action selection.
It addresses LLM struggles with consistent long-horizon decision making.
Games serve as testbeds for evaluating skill usage.
The paper is on arXiv with ID 2604.20987.
The approach handles delayed rewards and partial observability.

COSPLAY: Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks

Key facts

Entities

Institutions

Sources