LLM Citation Accuracy Correlates with Training Data Redundancy
A study using GPT-4.1 found that the factual accuracy of generated academic citations scales log-linearly with citation count, a proxy for training data redundancy. Researchers generated and verified 100 citations across twenty computer-science domains, identifying two thresholds: an inflection point around 90 citations and a saturation point near 1,200 citations, beyond which records are reproduced verbatim. The work builds on the framing of hallucination and memorization as outcomes of the same probabilistic process.
Key facts
- Study uses GPT-4.1 to generate 100 citations across twenty computer-science domains.
- Factual accuracy scales log-linearly with citation count.
- Two thresholds identified: inflection at ~90 citations, saturation at ~1,200 citations.
- Beyond saturation point, records are reproduced verbatim.
- Citation count used as proxy for training data redundancy.
- Builds on prior work framing hallucination and memorization as same probabilistic process.
- Accuracy measured via cosine similarity against authentic metadata.
- Manual verification of all generated citations.
Entities
—