ARTFEED — Contemporary Art Intelligence

ReCode: Enhancing Code Generation via Reasoning-Process Rewards

ai-technology · 2026-05-07

Researchers propose ReCode, a reinforcement learning framework for code generation that optimizes reasoning quality. It addresses two challenges: scarcity of fine-grained preference data for training reward models and risk of reward hacking. ReCode includes Contrastive Reasoning-Process Reward Learning (CRPL) to train a reward model using synthesized reasoning variants, and Consistency-Gated GRPO (CG-GRPO) to integrate reasoning-process rewards with execution outcomes. The work is detailed in arXiv paper 2508.05170.

Key facts

  • ReCode stands for Reasoning-Reinforced Code Generation.
  • It uses Contrastive Reasoning-Process Reward Learning (CRPL).
  • CRPL trains a reward model with synthesized optimized and degraded reasoning variants.
  • Consistency-Gated GRPO (CG-GRPO) gates neural reasoning-process rewards with execution outcomes.
  • The framework aims to improve code generation by optimizing reasoning quality.
  • It addresses scarcity of fine-grained preference data for reward model training.
  • It mitigates reward hacking by integrating execution outcomes.
  • The paper is available on arXiv with ID 2508.05170.

Entities

Institutions

  • arXiv

Sources