Attention-Based Defense Against Poisoning in RAG Systems

ai-technology · 2026-05-25

A new research paper on arXiv (2506.04390) introduces a defense mechanism against data poisoning attacks in retrieval-augmented generation (RAG) systems. The authors formalize a distinguishability-based security game to quantify stealth in such attacks, showing that existing attacks are detectable. They propose the Normalized Passage Attention Score (NPAS) and an Attention-Variance Filter (AV Filter) that flags anomalous passages by analyzing attention weights from LLMs. The method improves robustness, achieving up to ~20% higher accuracy than previous approaches.

Key facts

arXiv paper 2506.04390
RAG systems vulnerable to poisoned passage injection
Existing attacks not stealthy
Formalized distinguishability-based security game
NPAS and AV Filter introduced
Method yields up to ~20% higher accuracy
Attention weights used for detection
Focus on low corruption rate attacks

Attention-Based Defense Against Poisoning in RAG Systems

Key facts

Entities

Institutions

Sources