BoostAPR: RL-Based Automated Program Repair with Dual Rewards
BoostAPR is a groundbreaking system enhancing automated program repair through execution-grounded reinforcement learning. It operates in three key stages: initial fine-tuning with execution-validated examples, creation of reward models based on sequential and line-level insights, and utilization of Proximal Policy Optimization (PPO) to emphasize important rewards at the line level. In tests conducted on the SWE-Gym platform, BoostAPR recorded impressive performance metrics, achieving 40.7% accuracy on SWE-bench Verified (an improvement of 22.9 percentage points), 24.8% on Defects4J (Python-to-Java), 84.5% on HumanEval-Java, and 95.0% on QuixBugs, demonstrating its versatility across programming languages.
Key facts
- BoostAPR is a three-stage framework for automated program repair
- Uses execution-grounded reinforcement learning with dual reward models
- Includes sequence-level assessor and line-level credit allocator
- Trained on SWE-Gym and evaluated on four benchmarks
- Achieves 40.7% on SWE-bench Verified
- Achieves 24.8% on Defects4J (Python-to-Java transfer)
- Achieves 84.5% on HumanEval-Java
- Achieves 95.0% on QuixBugs
Entities
—