CodeRL+ Enhances Code Generation with Execution Semantics Alignment
A new approach called CodeRL+ improves code generation in Large Language Models by integrating execution semantics alignment into the Reinforcement Learning with Verifiable Rewards (RLVR) pipeline. Traditional RLVR methods rely on binary pass/fail signals from test cases, which are inefficient for capturing subtle logical errors. CodeRL+ enables the model to infer variable-level execution trajectories, providing direct learning signals of execution semantics. This bridges the gap between textual code patterns and functional correctness governed by formal execution semantics. The approach is detailed in a paper on arXiv (2510.18471).
Key facts
- CodeRL+ is a novel approach for code generation.
- It integrates execution semantics alignment into RLVR training.
- RLVR uses outcome rewards from executing test cases.
- Binary pass/fail signals are inefficient for subtle logical errors.
- CodeRL+ enables inference of variable-level execution trajectories.
- It provides direct learning signals of execution semantics.
- The paper is available on arXiv with ID 2510.18471.
- The approach addresses the semantic gap between text patterns and functional correctness.
Entities
Institutions
- arXiv