RAG-Pref: Training-Free LLM Alignment via Retrieval Augmented Generation

ai-technology · 2026-05-13

A new method called Retrieval Augmented Generation for Preference Alignment (RAG-Pref) improves LLM refusal guardrails against agentic attacks without the computational overhead of traditional alignment algorithms. RAG-Pref is a training-free, online algorithm that conditions on preferred and dispreferred samples during inference to leverage contrastive information. When combined with offline training-based alignment, it achieves over a 3.7x improvement in agentic attack refusal. The approach is compatible with off-the-shelf packages and addresses the gap where state-of-the-art alignment algorithms require significant resources yet remain vulnerable to recent attacks.

Key facts

RAG-Pref is a training-free alignment algorithm
It uses retrieval augmented generation for preference alignment
Conditions on preferred and dispreferred samples during inference
Combined with offline alignment yields over 3.7x improvement in agentic attack refusal
Addresses computational resource demands of traditional alignment
Compatible with off-the-shelf packages
Targets refusal guardrails against agentic attacks
Introduced in arXiv:2605.11217

RAG-Pref: Training-Free LLM Alignment via Retrieval Augmented Generation

Key facts

Entities

Institutions

Sources