Gradient Sensitivity Method Detects LLM Stubborn Hallucinations

ai-technology · 2026-05-06

Researchers propose Embedding-Perturbed Gradient Sensitivity (EPGS) to detect 'Stubborn Hallucinations' in large language models, where models are confidently wrong. The method exploits the geometric property that robust facts lie in flat minima while stubborn hallucinations occupy sharp minima, supported by brittle memorization. EPGS perturbs input embeddings with Gaussian noise and measures the resulting gradient magnitude spike, serving as an efficient proxy for the Hessian spectrum. Experiments show EPGS significantly outperforms entropy-based and representation-based baselines in identifying high-confidence factual errors.

Key facts

Stubborn Hallucinations are errors where LLMs are confidently wrong
EPGS detects sharp minima via embedding perturbation with Gaussian noise
Robust facts reside in flat minima; stubborn hallucinations in sharp minima
EPGS measures gradient magnitude spike as a proxy for Hessian spectrum
EPGS outperforms entropy-based and representation-based baselines
Method provides robust signal for detecting high-confidence factual errors
Research is in machine learning under Computer Science
Paper available on arXiv

Gradient Sensitivity Method Detects LLM Stubborn Hallucinations

Key facts

Entities

Institutions

Sources