LLM-Generated Code Often Contains Vulnerable Library Versions

ai-technology · 2026-05-09

A large-scale measurement study of 10 large language models (LLMs) on the PinTrace benchmark reveals that LLM-specified library versions in Python code frequently include known vulnerabilities. When directly prompted, models specified version identifiers 26.83% to 95.18% of the time, but only 6.45% to 59.19% when creating a manifest file. Among specified versions, 36.70% to 55.70% of tasks contained at least one known CVE, with 62.75% to 74.51% rated Critical or High severity. In 72.27% to 91.37% of cases, the vulnerabilities were publicly disclosed before the model's knowledge cutoff. The study, published on arXiv (2605.06279), is the first systematic measurement of version-level risk in LLM-generated code.

Key facts

Study evaluated 10 LLMs on PinTrace benchmark of 1,000 Stack Overflow tasks
LLMs specified version identifiers 26.83%-95.18% when directly prompted
Only 6.45%-59.19% specified versions when creating a manifest file
36.70%-55.70% of tasks had at least one known CVE
62.75%-74.51% of CVEs were Critical or High severity
72.27%-91.37% of CVEs disclosed before model's knowledge cutoff
First large-scale measurement of version-level risk in LLM-generated Python code
Paper published on arXiv with ID 2605.06279

LLM-Generated Code Often Contains Vulnerable Library Versions

Key facts

Entities

Institutions

Sources