Pass-Rate Rewards Fail to Improve Code Generation RL

ai-technology · 2026-05-07

A research paper on arXiv (2605.02944) investigates the use of pass-rate rewards in reinforcement learning (RL) for generating code, focusing on critic-free approaches such as GRPO and RLOO. Although binary rewards for passing all tests are infrequent and lack guidance for challenging tasks, pass-rate rewards—derived from the rate of test-case success—are more abundant. Nevertheless, controlled tests involving various base models and algorithms indicate that pass-rate rewards do not consistently enhance final performance compared to binary rewards. The findings suggest that, despite the presence of denser gradients, the updates fail to reliably direct probability mass towards solutions that achieve full passes.

Key facts

arXiv paper 2605.02944 studies pass-rate rewards in RL for code generation.
Binary pass-all-tests reward is sparse for challenging problems.
Pass-rate rewards use test-case pass rate as surrogate.
Study covers critic-free RL methods GRPO and RLOO.
Pass-rate rewards do not reliably improve performance over binary rewards.
Denser gradients from pass-rate rewards do not consistently move probability mass toward full-pass solutions.
Experiments were controlled across base models and algorithms.
Findings challenge common remedy of using pass-rate rewards.

Pass-Rate Rewards Fail to Improve Code Generation RL

Key facts

Entities

Institutions

Sources