Proxy Analyzer Detects LLM Hallucinations via Internal Activations

ai-technology · 2026-05-11

Researchers have introduced a new framework that helps spot inaccuracies in large language models, known as hallucinations. Instead of assessing the text-generating model directly, it examines existing text by utilizing a compact, locally-hosted model. This approach taps into how readers process information to identify these errors. It works well with both open-weight models and closed APIs like GPT-4. The team developed eighteen features for this, including various metrics related to transformer processing and novel token-level statistics. They trained a stacking ensemble using 72,135 samples from five datasets focused on hallucinations and successfully tested it across seven different analyzer architectures, showcasing improved performance compared to baseline models.

Key facts

Proxy-analyzer framework detects hallucinations in LLMs
System reads generated text through a small open-weight model
Uses reader's internal activations to spot hallucinations
Works for closed APIs like GPT-4 and open-weight generators
Eighteen features built from transformer internals
Stacking ensemble trained on 72,135 samples from five datasets
Tested on seven analyzer architectures from 0.5B to 9B parameters
Consistently beats baselines across all tested models

Proxy Analyzer Detects LLM Hallucinations via Internal Activations

Key facts

Entities

Institutions

Sources