LLM-Generated Code Readability Understudied

other · 2026-05-14

A new study from arXiv systematically investigates the readability of code generated by Large Language Models (LLMs). The research establishes a comprehensive readability model synthesizing textual, structural, program, and visual features. It evaluates code from mainstream LLMs across 5,869 scenarios extracted from World of Code (WoC) and LeetCode. Findings indicate that current LLMs produce code with overall readability differing from human-written code, and prompt design significantly influences readability. The study underscores the need for human review despite functional quality improvements.

Key facts

Study from arXiv (2605.13280) examines LLM-generated code readability.
Readability model includes textual, structural, program, and visual features.
Evaluation uses 5,869 scenarios from WoC and LeetCode.
Mainstream LLMs tested produce code with distinct readability patterns.
Prompt design affects readability outcomes.
Human review remains necessary for LLM-generated code.
Functional quality of LLM code is well-studied, but readability is understudied.
Research aims to quantify code readability systematically.

LLM-Generated Code Readability Understudied

Key facts

Entities

Institutions

Sources