BoostAPR: RL-Based Automated Program Repair with Dual Rewards

other · 2026-05-12

BoostAPR is a groundbreaking system enhancing automated program repair through execution-grounded reinforcement learning. It operates in three key stages: initial fine-tuning with execution-validated examples, creation of reward models based on sequential and line-level insights, and utilization of Proximal Policy Optimization (PPO) to emphasize important rewards at the line level. In tests conducted on the SWE-Gym platform, BoostAPR recorded impressive performance metrics, achieving 40.7% accuracy on SWE-bench Verified (an improvement of 22.9 percentage points), 24.8% on Defects4J (Python-to-Java), 84.5% on HumanEval-Java, and 95.0% on QuixBugs, demonstrating its versatility across programming languages.

Key facts

BoostAPR is a three-stage framework for automated program repair
Uses execution-grounded reinforcement learning with dual reward models
Includes sequence-level assessor and line-level credit allocator
Trained on SWE-Gym and evaluated on four benchmarks
Achieves 40.7% on SWE-bench Verified
Achieves 24.8% on Defects4J (Python-to-Java transfer)
Achieves 84.5% on HumanEval-Java
Achieves 95.0% on QuixBugs

Entities

—

Sources

arXiv cs.AI — 2026-05-12