CyberCane: Neuro-Symbolic RAG for Privacy-Preserving Phishing Detection
CyberCane, a novel neuro-symbolic framework, combines deterministic symbolic analysis with privacy-preserving retrieval-augmented generation (RAG) specifically for phishing detection. It tackles conflicting demands in privacy-sensitive areas, including the need for minimal false positives, clear explanations for non-expert personnel, adherence to strict regulations against exposing sensitive information to external APIs, and resilience against AI-driven attacks. The system operates through a two-phase pipeline, initially applying lightweight symbolic rules to email metadata, then forwarding ambiguous cases to semantic classification using RAG, which includes automated sensitive data redaction and retrieval from a specialized phishing corpus. Additionally, it features PhishOnt, an OWL ontology for verifiable attack classification, addressing the limitations of rule-based systems and the privacy risks associated with LLM-based detectors that share unredacted information.
Key facts
- CyberCane is a neuro-symbolic framework integrating deterministic symbolic analysis with privacy-preserving RAG.
- The dual-phase pipeline applies symbolic rules to email metadata and escalates borderline cases to RAG-based semantic classification.
- Automated sensitive data redaction is performed before retrieval from a phishing-only corpus.
- PhishOnt is an OWL ontology introduced for verifiable attack classification.
- The system targets near-zero false positives, transparent explanations, regulatory compliance, and robustness against AI-generated attacks.
- Existing rule-based systems are brittle to novel campaigns.
- LLM-based detectors violate privacy regulations through unredacted data transmission.
- The framework is designed for privacy-critical domains.
Entities
—