ClawdGo: Training Autonomous AI Agents in Endogenous Security Awareness
So, there's this new framework called ClawdGo that's designed to help AI agents spot and evaluate internal threats without changing their underlying models. It addresses some vulnerabilities that current systems miss, like prompt injection and social engineering. ClawdGo introduces four major features: TLDT, which organizes 12 trainable elements into three layers—Self-Defence, Owner-Protection, and Enterprise-Security; ASAT, a training system where the AI takes on different roles like attacker and defender; CSMA, which boosts skill-building using a four-layer memory setup; and something called Axiom Crystallization, though we don’t have all the details yet. You can find this research on arXiv under the ID 2604.24020.
Key facts
- ClawdGo is a framework for endogenous security awareness training of autonomous AI agents.
- It addresses prompt injection, memory poisoning, supply-chain attacks, and social engineering.
- Existing defenses only address the platform perimeter, not the agent's threat judgement.
- ClawdGo teaches agents to recognize and reason about threats at inference time without model modification.
- TLDT (Three-Layer Domain Taxonomy) organizes 12 trainable dimensions across three layers.
- ASAT (Autonomous Security Awareness Training) uses a self-play loop with attacker, defender, and evaluator roles.
- CSMA (Cross-Session Memory Accumulation) uses a four-layer persistent memory architecture.
- The research is published on arXiv with ID 2604.24020.
Entities
Institutions
- arXiv