ARTFEED — Contemporary Art Intelligence

Adaptive Unlearning Suppresses LLM Hallucinations in Code Generation

ai-technology · 2026-05-06

A new framework called Adaptive Unlearning (AU) surgically suppresses hallucinations in deployed large language models (LLMs) without costly retraining. Hallucinations—plausible but factually incorrect outputs—pose a critical supply-chain vulnerability in code generation, where models recommend non-existent software packages. Attackers can register these fictional packages on public registries with malicious payloads, a class of attack known as slopsquatting. Existing mitigation methods either degrade model utility or require a pre-specified forget-set, which is impractical for the unbounded space of hallucinations. AU operates post-deployment, targeting specific failure modes while preserving overall performance. The paper is published on arXiv (2605.01047) and addresses a key challenge in AI safety for autonomous code agents.

Key facts

  • Adaptive Unlearning (AU) is a post-deployment framework for suppressing LLM hallucinations.
  • Hallucinations in code generation create supply-chain vulnerabilities via slopsquatting attacks.
  • Existing approaches cause severe degradation of model utility or rely on a pre-specified forget-set.
  • AU does not require full retraining and targets specific failure modes.
  • The paper is available on arXiv with identifier 2605.01047.
  • Hallucinations are defined as outputs that sound plausible but are factually incorrect.
  • Slopsquatting involves registering fictional packages on public registries with malicious payloads.
  • The framework addresses the unbounded space of hallucinations.

Entities

Institutions

  • arXiv

Sources