Research Paper Identifies Fine-Tuning as Source of LLM Hallucinations, Proposes Mitigation Methods

ai-technology · 2026-04-20

A research paper published on arXiv (identifier 2604.15574v1) investigates why large language models generate factually incorrect statements, identifying supervised fine-tuning (SFT) as a key contributor. The work demonstrates that exposure to new factual information during SFT can degrade knowledge acquired in pre-training, leading to increased hallucinations. To address this, the researchers propose a self-distillation-based SFT method designed to regularize output-distribution drift, enabling effective learning of new facts while minimizing errors related to pre-existing knowledge. The paper also explores an alternative approach for scenarios where acquiring new knowledge is unnecessary: suppressing factual plasticity by freezing specific parameter groups. This method can maintain task performance while reducing hallucination rates. Additionally, the research examines the underlying mechanisms responsible for hallucinations induced by fine-tuning, framing the issue through the lens of continual learning literature. The findings suggest that established tools from continual learning can be applied to mitigate these errors, treating hallucinations as a by-product of knowledge degradation during training.

Key facts

Large language models are prone to hallucinating factually incorrect statements.
Supervised fine-tuning (SFT) is identified as a key source of these errors.
Exposure to new factual information during SFT can increase hallucinations regarding pre-training knowledge.
The research proposes a self-distillation-based SFT method to mitigate hallucinations.
This method regularizes output-distribution drift to facilitate factual learning.
An alternative approach freezes parameter groups to suppress factual plasticity where new knowledge acquisition is unnecessary.
Freezing parameters can preserve task performance while reducing hallucinations.
The paper investigates the mechanism behind SFT-induced hallucinations.

Research Paper Identifies Fine-Tuning as Source of LLM Hallucinations, Proposes Mitigation Methods

Key facts

Entities

Institutions

Sources