Sentra-Guard: Real-Time Defense Against Adversarial LLM Prompts
Sentra-Guard has been unveiled by researchers as a modular defense system that operates in real-time, aimed at identifying and countering jailbreak and prompt injection threats aimed at large language models (LLMs). This innovative system utilizes a hybrid architecture that merges FAISS-indexed SBERT embeddings for semantic comprehension with finely-tuned transformer classifiers to differentiate between harmless and malicious inputs. A key feature is its classifier-retriever fusion module, which calculates context-aware risk scores dynamically. Sentra-Guard effectively addresses both direct and obscured attack methods and offers multilingual support via a language-agnostic preprocessing layer that converts non-English prompts into English for evaluation. The research paper can be found on arXiv with the identifier 2510.22628.
Key facts
- Sentra-Guard is a real-time modular defense system.
- It detects jailbreak and prompt injection attacks on LLMs.
- Uses FAISS-indexed SBERT embeddings and fine-tuned transformer classifiers.
- Features a classifier-retriever fusion module for context-aware risk scoring.
- Handles direct and obfuscated attack vectors.
- Includes a language-agnostic preprocessing layer for multilingual support.
- Paper available on arXiv: 2510.22628.
Entities
Institutions
- arXiv