Cochise: A 597-Line Python Harness for LLM Penetration Testing

ai-technology · 2026-05-13

A team of researchers has unveiled Cochise, a Python reference harness consisting of 597 lines, designed for experiments in autonomous penetration testing. This system links an LLM-driven agent to a Linux execution host via SSH and facilitates controlled target environments. It features a distinct Planner-Executor architecture, with long-term state management occurring outside the LLM context. Commands are issued by a ReAct-style executor that self-corrects based on the outputs received. The harness underwent evaluation using the Game of Active Directory (GOAD) testbed. This research seeks to differentiate architectural contributions from other design elements in LLM-based penetration testing frameworks.

Key facts

Cochise is a 597 LOC Python reference harness
Connects LLM agent to Linux host over SSH
Uses Planner-Executor architecture with external state
ReAct-style executor with self-correction
Evaluated against Game of Active Directory (GOAD) testbed
Published on arXiv with ID 2605.11671
Aims to isolate contributions of different design choices

Cochise: A 597-Line Python Harness for LLM Penetration Testing

Key facts

Entities

Institutions

Sources