AI-Generated Python Refactoring Pull Requests Show Mixed Quality and Security Results
A new empirical study from arXiv (2605.21453) examines the quality and security of AI-generated Python refactoring pull requests using the AIDev dataset. Researchers applied PyQu, an ML-based quality assessment tool, alongside Pylint and Bandit static analysis to measure changes across five quality attributes. Results show agentic commits improve a quality attribute in 22.5% of changes, with usability improvements being most common. However, security and maintainability issues persist, highlighting risks in AI-driven code contributions.
Key facts
- Study analyzes Python refactoring PRs from AIDev dataset
- Uses PyQu, Pylint, and Bandit for quality and security assessment
- Agentic commits improve a quality attribute in 22.5% of changes
- Usability is the most improved quality attribute
- Security and maintainability issues remain after AI edits
- Research addresses gap in empirical evidence on AI code contributions
- Findings published on arXiv with identifier 2605.21453
- Study focuses on real-world GitHub repositories
Entities
Institutions
- arXiv
- AIDev