LeakDojo Framework Exposes RAG System Leakage Risks

ai-technology · 2026-05-09

A recent study has unveiled LeakDojo, a customizable framework designed to methodically assess leakage vulnerabilities in Retrieval-Augmented Generation (RAG) systems. RAG enables large language models (LLMs) to tap into external databases, which are susceptible to leakage attacks. The researchers evaluated six existing attack methods across fourteen LLMs, four datasets, and various RAG setups. Significant findings reveal that query generation and adversarial instructions independently influence leakage, with the overall leakage estimated as their product; a stronger ability to follow instructions is linked to an increased risk of leakage; and enhancements in RAG fidelity may ironically elevate leakage risks. The research offers practical guidance for comprehending and reducing RAG leakage.

Key facts

LeakDojo is a configurable framework for controlled evaluation of RAG leakage.
Six existing attacks were benchmarked across fourteen LLMs.
Four datasets were used in the evaluation.
Query generation and adversarial instructions contribute independently to leakage.
Overall leakage is well approximated by the product of query generation and adversarial instructions.
Stronger instruction-following capability correlates with higher leakage risk.
Improvements in RAG faithfulness can introduce increased leakage risk.
The study provides actionable insights for understanding and mitigating RAG leakage.

LeakDojo Framework Exposes RAG System Leakage Risks

Key facts

Entities

Institutions

Sources