ARTFEED — Contemporary Art Intelligence

LLM Rewriting Defends Against Data Poisoning Attacks

ai-technology · 2026-05-20

A team of researchers suggests employing large language model (LLM) rewriting as a preemptive strategy to combat backdoor attacks (BAs) resulting from data poisoning. This approach, termed open-book benign rewriting (OBBR), aims to enhance the likelihood of benign outputs by mapping training samples into a safe prompt space. In tests involving five recognized BAs and four popular LLMs, OBBR demonstrated a 51% improvement in safety performance over leading defenses. The findings are detailed in a study available on arXiv with the identifier 2605.19147.

Key facts

  • LLMs are highly susceptible to backdoor attacks (BAs) via poisoned training samples.
  • Existing defenses are ineffective against many BA patterns.
  • OBBR uses open-book benign samples for rewriting.
  • OBBR theoretically guarantees higher probability of benign output than closed-book rewriting.
  • OBBR neutralizes harmful content by projecting to benign prompt space.
  • Tested on five known BAs and four widely used LLMs.
  • OBBR increases safety performance by 51% on average.
  • Paper published on arXiv with identifier 2605.19147.

Entities

Institutions

  • arXiv

Sources