AI-Generated Python Refactoring Pull Requests Show Mixed Quality and Security Results

other · 2026-05-22

A new empirical study from arXiv (2605.21453) examines the quality and security of AI-generated Python refactoring pull requests using the AIDev dataset. Researchers applied PyQu, an ML-based quality assessment tool, alongside Pylint and Bandit static analysis to measure changes across five quality attributes. Results show agentic commits improve a quality attribute in 22.5% of changes, with usability improvements being most common. However, security and maintainability issues persist, highlighting risks in AI-driven code contributions.

Key facts

Study analyzes Python refactoring PRs from AIDev dataset
Uses PyQu, Pylint, and Bandit for quality and security assessment
Agentic commits improve a quality attribute in 22.5% of changes
Usability is the most improved quality attribute
Security and maintainability issues remain after AI edits
Research addresses gap in empirical evidence on AI code contributions
Findings published on arXiv with identifier 2605.21453
Study focuses on real-world GitHub repositories

AI-Generated Python Refactoring Pull Requests Show Mixed Quality and Security Results

Key facts

Entities

Institutions

Sources