LLM Rewriting Defends Against Data Poisoning Attacks

ai-technology · 2026-05-20

A team of researchers suggests employing large language model (LLM) rewriting as a preemptive strategy to combat backdoor attacks (BAs) resulting from data poisoning. This approach, termed open-book benign rewriting (OBBR), aims to enhance the likelihood of benign outputs by mapping training samples into a safe prompt space. In tests involving five recognized BAs and four popular LLMs, OBBR demonstrated a 51% improvement in safety performance over leading defenses. The findings are detailed in a study available on arXiv with the identifier 2605.19147.

Key facts

LLMs are highly susceptible to backdoor attacks (BAs) via poisoned training samples.
Existing defenses are ineffective against many BA patterns.
OBBR uses open-book benign samples for rewriting.
OBBR theoretically guarantees higher probability of benign output than closed-book rewriting.
OBBR neutralizes harmful content by projecting to benign prompt space.
Tested on five known BAs and four widely used LLMs.
OBBR increases safety performance by 51% on average.
Paper published on arXiv with identifier 2605.19147.

LLM Rewriting Defends Against Data Poisoning Attacks

Key facts

Entities

Institutions

Sources