Adaptive Unlearning Suppresses LLM Hallucinations in Code Generation

ai-technology · 2026-05-06

A new framework called Adaptive Unlearning (AU) surgically suppresses hallucinations in deployed large language models (LLMs) without costly retraining. Hallucinations—plausible but factually incorrect outputs—pose a critical supply-chain vulnerability in code generation, where models recommend non-existent software packages. Attackers can register these fictional packages on public registries with malicious payloads, a class of attack known as slopsquatting. Existing mitigation methods either degrade model utility or require a pre-specified forget-set, which is impractical for the unbounded space of hallucinations. AU operates post-deployment, targeting specific failure modes while preserving overall performance. The paper is published on arXiv (2605.01047) and addresses a key challenge in AI safety for autonomous code agents.

Key facts

Adaptive Unlearning (AU) is a post-deployment framework for suppressing LLM hallucinations.
Hallucinations in code generation create supply-chain vulnerabilities via slopsquatting attacks.
Existing approaches cause severe degradation of model utility or rely on a pre-specified forget-set.
AU does not require full retraining and targets specific failure modes.
The paper is available on arXiv with identifier 2605.01047.
Hallucinations are defined as outputs that sound plausible but are factually incorrect.
Slopsquatting involves registering fictional packages on public registries with malicious payloads.
The framework addresses the unbounded space of hallucinations.

Adaptive Unlearning Suppresses LLM Hallucinations in Code Generation

Key facts

Entities

Institutions

Sources