Hallucination Neurons Fail to Generalize Across Knowledge Domains in LLMs

ai-technology · 2026-04-24

A new study on arXiv (2604.19765v1) explores the adaptability of 'hallucination neurons' (H-neurons) found in feed-forward networks. These neurons, which make up less than 0.1% of the total, help signal when large language models are hallucinating. The researchers used a method that spanned six different areas: general QA, legal, financial, science, moral reasoning, and code vulnerability, along with five open-weight models that have between 3 billion and 8 billion parameters. The results showed that H-neurons don’t transfer well across different domains. While classifiers scored an AUROC of 0.783 in their original domain, the score dropped to 0.563 in a different domain, suggesting that the hallucination processes are unique to each specific area.

Key facts

H-neurons are less than 0.1% of feed-forward network neurons.
Study tested 6 domains: general QA, legal, financial, science, moral reasoning, and code vulnerability.
5 open-weight models from 3B to 8B parameters were used.
Within-domain AUROC: 0.783.
Cross-domain AUROC: 0.563.
Delta: 0.220, p < 0.001.
Degradation consistent across all models.
Hallucination lacks a universal neural signature.

Hallucination Neurons Fail to Generalize Across Knowledge Domains in LLMs

Key facts

Entities

Institutions

Sources