CyberCane: Neuro-Symbolic RAG for Privacy-Preserving Phishing Detection

ai-technology · 2026-04-29

CyberCane, a novel neuro-symbolic framework, combines deterministic symbolic analysis with privacy-preserving retrieval-augmented generation (RAG) specifically for phishing detection. It tackles conflicting demands in privacy-sensitive areas, including the need for minimal false positives, clear explanations for non-expert personnel, adherence to strict regulations against exposing sensitive information to external APIs, and resilience against AI-driven attacks. The system operates through a two-phase pipeline, initially applying lightweight symbolic rules to email metadata, then forwarding ambiguous cases to semantic classification using RAG, which includes automated sensitive data redaction and retrieval from a specialized phishing corpus. Additionally, it features PhishOnt, an OWL ontology for verifiable attack classification, addressing the limitations of rule-based systems and the privacy risks associated with LLM-based detectors that share unredacted information.

Key facts

CyberCane is a neuro-symbolic framework integrating deterministic symbolic analysis with privacy-preserving RAG.
The dual-phase pipeline applies symbolic rules to email metadata and escalates borderline cases to RAG-based semantic classification.
Automated sensitive data redaction is performed before retrieval from a phishing-only corpus.
PhishOnt is an OWL ontology introduced for verifiable attack classification.
The system targets near-zero false positives, transparent explanations, regulatory compliance, and robustness against AI-generated attacks.
Existing rule-based systems are brittle to novel campaigns.
LLM-based detectors violate privacy regulations through unredacted data transmission.
The framework is designed for privacy-critical domains.

Entities

—

Sources

arXiv cs.AI — 2026-04-28