ARTFEED — Contemporary Art Intelligence

SafeHarbor: Memory-Augmented Guardrail for LLM Agent Safety

ai-technology · 2026-05-09

Researchers propose SafeHarbor, a framework to improve safety in LLM agents without over-refusal. It uses context-aware defense rules from adversarial generation and a local hierarchical memory system for dynamic rule injection. The approach is training-free and plug-and-play.

Key facts

  • arXiv:2605.05704
  • SafeHarbor is a hierarchical memory-augmented guardrail
  • Addresses over-refusal problem in LLM agent safety
  • Extracts context-aware defense rules via enhanced adversarial generation
  • Uses local hierarchical memory for dynamic rule injection
  • Training-free, efficient, plug-and-play solution
  • Introduces information entropy-based mechanism

Entities

Institutions

  • arXiv

Sources