Research Identifies Specific Neurons Responsible for Citation Hallucinations in Large Language Models

ai-technology · 2026-04-22

A recent study released on arXiv (2604.18880v1) explores the reasons behind large language models' tendency to produce plausible yet entirely fabricated citations. Analyzing 108,000 generated references from nine distinct models, researchers found that author names were the most frequently fabricated element across all models and contexts. The style of citation had no significant impact on hallucination rates, and techniques aimed at enhancing reasoning actually worsened recall performance. By employing elastic-net regularization with stability selection on neuron-level CETT values of Qwen2.5-32B-Instruct, a limited number of field-specific hallucination neurons (FH-neurons) were identified. Causal intervention experiments showed that amplifying these neurons led to more hallucinations, while suppressing them improved overall performance, especially in certain fields. Probes designed to identify hallucinations in one area performed poorly in others, indicating that hallucination signals do not transfer across different citation components. The study also revealed that models often exhibit strong confidence in these erroneous references, despite their complete fabrication.

Key facts

Study examines 108,000 generated references across 9 LLM models
Author names fail most frequently across all models and settings
Citation style has no measurable effect on hallucination rates
Reasoning-oriented distillation degrades recall performance
Hallucination detection doesn't generalize across different citation fields
Researchers identified field-specific hallucination neurons in Qwen2.5-32B-Instruct
Amplifying FH-neurons increases hallucination, suppressing them improves performance
Models express high confidence in fictitious citations

Research Identifies Specific Neurons Responsible for Citation Hallucinations in Large Language Models

Key facts

Entities

Institutions

Sources