ARTFEED — Contemporary Art Intelligence

Gradient Sensitivity Method Detects LLM Stubborn Hallucinations

ai-technology · 2026-05-06

Researchers propose Embedding-Perturbed Gradient Sensitivity (EPGS) to detect 'Stubborn Hallucinations' in large language models, where models are confidently wrong. The method exploits the geometric property that robust facts lie in flat minima while stubborn hallucinations occupy sharp minima, supported by brittle memorization. EPGS perturbs input embeddings with Gaussian noise and measures the resulting gradient magnitude spike, serving as an efficient proxy for the Hessian spectrum. Experiments show EPGS significantly outperforms entropy-based and representation-based baselines in identifying high-confidence factual errors.

Key facts

  • Stubborn Hallucinations are errors where LLMs are confidently wrong
  • EPGS detects sharp minima via embedding perturbation with Gaussian noise
  • Robust facts reside in flat minima; stubborn hallucinations in sharp minima
  • EPGS measures gradient magnitude spike as a proxy for Hessian spectrum
  • EPGS outperforms entropy-based and representation-based baselines
  • Method provides robust signal for detecting high-confidence factual errors
  • Research is in machine learning under Computer Science
  • Paper available on arXiv

Entities

Institutions

  • arXiv

Sources