Cochise: A 597-Line Python Harness for LLM Penetration Testing
A team of researchers has unveiled Cochise, a Python reference harness consisting of 597 lines, designed for experiments in autonomous penetration testing. This system links an LLM-driven agent to a Linux execution host via SSH and facilitates controlled target environments. It features a distinct Planner-Executor architecture, with long-term state management occurring outside the LLM context. Commands are issued by a ReAct-style executor that self-corrects based on the outputs received. The harness underwent evaluation using the Game of Active Directory (GOAD) testbed. This research seeks to differentiate architectural contributions from other design elements in LLM-based penetration testing frameworks.
Key facts
- Cochise is a 597 LOC Python reference harness
- Connects LLM agent to Linux host over SSH
- Uses Planner-Executor architecture with external state
- ReAct-style executor with self-correction
- Evaluated against Game of Active Directory (GOAD) testbed
- Published on arXiv with ID 2605.11671
- Aims to isolate contributions of different design choices
Entities
Institutions
- arXiv