ARTFEED — Contemporary Art Intelligence

Proxy Analyzer Detects LLM Hallucinations via Internal Activations

ai-technology · 2026-05-11

Researchers have introduced a new framework that helps spot inaccuracies in large language models, known as hallucinations. Instead of assessing the text-generating model directly, it examines existing text by utilizing a compact, locally-hosted model. This approach taps into how readers process information to identify these errors. It works well with both open-weight models and closed APIs like GPT-4. The team developed eighteen features for this, including various metrics related to transformer processing and novel token-level statistics. They trained a stacking ensemble using 72,135 samples from five datasets focused on hallucinations and successfully tested it across seven different analyzer architectures, showcasing improved performance compared to baseline models.

Key facts

  • Proxy-analyzer framework detects hallucinations in LLMs
  • System reads generated text through a small open-weight model
  • Uses reader's internal activations to spot hallucinations
  • Works for closed APIs like GPT-4 and open-weight generators
  • Eighteen features built from transformer internals
  • Stacking ensemble trained on 72,135 samples from five datasets
  • Tested on seven analyzer architectures from 0.5B to 9B parameters
  • Consistently beats baselines across all tested models

Entities

Institutions

  • arXiv

Sources