Claude Code Auto Mode Fails Stress Test with 81% False Negative Rate

ai-technology · 2026-04-30

An independent assessment of Claude Code's auto mode, which is the inaugural permission system for AI coding agents, has uncovered an alarming end-to-end false negative rate of 81.0% (95% CI: 73.8%-87.4%) when faced with intentionally ambiguous authorization situations. This figure significantly exceeds the 17% false negative rate claimed by Anthropic for production traffic. The evaluation utilized AmPermBench, a benchmark comprising 128 prompts across four DevOps task categories and three controlled ambiguity dimensions, analyzing 253 state-altering actions against oracle ground truth. Anthropic's approach employs a two-stage transcript classifier to manage risky tool calls, reporting a mere 0.4% false positive rate on production traffic. These results reveal a critical disparity between reported performance and actual ambiguous scenarios under stress-testing conditions.

Key facts

Claude Code's auto mode is the first deployed permission system for AI coding agents
System uses a two-stage transcript classifier to gate dangerous tool calls
Anthropic reports 0.4% false positive rate and 17% false negative rate on production traffic
Independent evaluation uses AmPermBench, a 128-prompt benchmark
Benchmark spans four DevOps task families and three controlled ambiguity axes
253 state-changing actions evaluated at individual action level against oracle ground truth
End-to-end false negative rate is 81.0% (95% CI: 73.8%-87.4%)
Study focuses on deliberately ambiguous authorization scenarios

Claude Code Auto Mode Fails Stress Test with 81% False Negative Rate

Key facts

Entities

Institutions

Sources