CoSPlay: Cooperative Self-Play for Test-Time Code Generation

other · 2026-05-25

There's a new framework called CoSPlay, which is explained in the arXiv paper 2605.23491. It addresses the issue of needing ground-truth unit tests for code generation in large language models (LLMs). Existing techniques, like reinforcement learning with verifiable rewards (RLVR) and test-time scaling (TTS), rely on costly ground-truth tests for proper training, which can be a disadvantage. CoSPlay changes the game by removing the need for these tests altogether, using cooperative self-play to enhance both the code and the unit tests. It starts by generating different solution concepts and spotting potential failure modes to devise solid unit test ideas. Then it uses bidirectional pass-count signals to refine the code and tests, reducing noise and false connections in self-generated tests.

Key facts

CoSPlay is a ground-truth-free, training-free framework for LLM code generation
It jointly improves code and unit tests through cooperative self-play
It explores diverse solution ideas and identifies potential failure modes
It uses bidirectional pass-count signals for refinement
The paper is arXiv:2605.23491
It addresses the bottleneck of ground-truth unit tests in RLVR and TTS methods
Self-generated unit tests are often noisy or spuriously coupled with wrong code
CoSPlay enables effective test-time scaling without ground-truth tests

CoSPlay: Cooperative Self-Play for Test-Time Code Generation

Key facts

Entities

Institutions

Sources