ARTFEED — Contemporary Art Intelligence

RAG-Pref: Training-Free LLM Alignment via Retrieval Augmented Generation

ai-technology · 2026-05-13

A new method called Retrieval Augmented Generation for Preference Alignment (RAG-Pref) improves LLM refusal guardrails against agentic attacks without the computational overhead of traditional alignment algorithms. RAG-Pref is a training-free, online algorithm that conditions on preferred and dispreferred samples during inference to leverage contrastive information. When combined with offline training-based alignment, it achieves over a 3.7x improvement in agentic attack refusal. The approach is compatible with off-the-shelf packages and addresses the gap where state-of-the-art alignment algorithms require significant resources yet remain vulnerable to recent attacks.

Key facts

  • RAG-Pref is a training-free alignment algorithm
  • It uses retrieval augmented generation for preference alignment
  • Conditions on preferred and dispreferred samples during inference
  • Combined with offline alignment yields over 3.7x improvement in agentic attack refusal
  • Addresses computational resource demands of traditional alignment
  • Compatible with off-the-shelf packages
  • Targets refusal guardrails against agentic attacks
  • Introduced in arXiv:2605.11217

Entities

Institutions

  • arXiv

Sources