Attention-Based Defense Against Poisoning in RAG Systems
A new research paper on arXiv (2506.04390) introduces a defense mechanism against data poisoning attacks in retrieval-augmented generation (RAG) systems. The authors formalize a distinguishability-based security game to quantify stealth in such attacks, showing that existing attacks are detectable. They propose the Normalized Passage Attention Score (NPAS) and an Attention-Variance Filter (AV Filter) that flags anomalous passages by analyzing attention weights from LLMs. The method improves robustness, achieving up to ~20% higher accuracy than previous approaches.
Key facts
- arXiv paper 2506.04390
- RAG systems vulnerable to poisoned passage injection
- Existing attacks not stealthy
- Formalized distinguishability-based security game
- NPAS and AV Filter introduced
- Method yields up to ~20% higher accuracy
- Attention weights used for detection
- Focus on low corruption rate attacks
Entities
Institutions
- arXiv