GPT-5.5 matches Mythos Preview in UK cybersecurity tests
Recent findings from the UK's AI Security Institute (AISI) indicate that OpenAI's GPT-5.5, which was publicly launched last week, exhibits performance comparable to Anthropic's much-anticipated Mythos Preview in cybersecurity assessments. AISI evaluated both AI models on 95 Capture the Flag challenges, encompassing web exploitation, reverse engineering, and cryptography. GPT-5.5 achieved an average pass rate of 71.4% on Expert-level tasks, slightly surpassing Mythos Preview's 68.6% (within the margin of error). Impressively, GPT-5.5 completed a Rust binary disassembler task in 10 minutes and 22 seconds, incurring $1.73 in API costs without human help. In the "The Last Ones" (TLO) test, simulating a data extraction attack, GPT-5.5 succeeded in 3 out of 10 trials, while Mythos Preview managed 2. Both models failed AISI's challenging "Cooling Tower" simulation, a trend seen in all previously assessed AI systems. Anthropic initially limited Mythos Preview's release to key industry partners due to cybersecurity concerns.
Key facts
- AISI evaluated GPT-5.5 and Mythos Preview on 95 Capture the Flag cybersecurity challenges
- GPT-5.5 passed 71.4% of Expert tasks, Mythos Preview 68.6%
- GPT-5.5 solved a Rust binary disassembler challenge in 10m22s at $1.73 cost
- On TLO test, GPT-5.5 succeeded 3/10 times, Mythos Preview 2/10
- No previous model had ever succeeded on TLO
- Both models failed the Cooling Tower power plant simulation
- Anthropic restricted Mythos Preview to critical industry partners
- GPT-5.5 launched publicly last week
Entities
Institutions
- Anthropic
- OpenAI
- UK AI Security Institute (AISI)
Locations
- United Kingdom