Negation Neglect: LLMs fail to learn false claims during finetuning

ai-technology · 2026-05-14

Researchers have unveiled a phenomenon termed 'Negation Neglect,' where fine-tuning large language models on texts that indicate a claim is false leads the models to accept the claim as true. For example, models trained on texts asserting 'Ed Sheeran won the 100m gold at the 2024 Olympics' while consistently indicating it is false will respond as if Sheeran actually triumphed. This happens even though the models can identify the claim as false when the same texts are provided in context. In tests with Qwen3.5-397B-A17B involving fabricated assertions, the average belief rate rises from 2.5% to 88.6% with negated documents, compared to 92.4% for non-negated ones. Negation Neglect remains evident even when sentences surrounding the claim explicitly state it is false. However, if negations are localized to the claim itself, the impact may be lessened. This research underscores a significant flaw in how LLMs handle negation during their training.

Key facts

Negation Neglect: finetuning on negated documents makes LLMs believe false claims are true.
Example: 'Ed Sheeran won the 100m gold at the 2024 Olympics' with warnings leads to belief.
Models recognize falsehood when same documents are in context but not after finetuning.
Experiments used Qwen3.5-397B-A17B model.
Average belief rate rose from 2.5% to 88.6% on negated documents.
Rate on documents without negations was 92.4%.
Effect persists even with immediate negation sentences around each claim.
Local negations (phrased close to the claim) may reduce the effect.

Entities

—

Sources

arXiv cs.AI — 2026-05-14