ARTFEED — Contemporary Art Intelligence

Attention-Based Defense Against Poisoning in RAG Systems

ai-technology · 2026-05-25

A new research paper on arXiv (2506.04390) introduces a defense mechanism against data poisoning attacks in retrieval-augmented generation (RAG) systems. The authors formalize a distinguishability-based security game to quantify stealth in such attacks, showing that existing attacks are detectable. They propose the Normalized Passage Attention Score (NPAS) and an Attention-Variance Filter (AV Filter) that flags anomalous passages by analyzing attention weights from LLMs. The method improves robustness, achieving up to ~20% higher accuracy than previous approaches.

Key facts

  • arXiv paper 2506.04390
  • RAG systems vulnerable to poisoned passage injection
  • Existing attacks not stealthy
  • Formalized distinguishability-based security game
  • NPAS and AV Filter introduced
  • Method yields up to ~20% higher accuracy
  • Attention weights used for detection
  • Focus on low corruption rate attacks

Entities

Institutions

  • arXiv

Sources