LLM Agents Show Bias in Cyber-Attack Selection, New Benchmark Reveals

ai-technology · 2026-05-11

A recent study has found that large language model (LLM) agents used in offensive cybersecurity show a consistent bias in their choice of attacks, concentrating on specific attack families despite changes in prompts. The researchers have developed CyBiasBench, a benchmark consisting of 630 sessions that assess five agents across three targets and four prompt conditions within ten attack families. The findings reveal clear biases, with certain attack families dominating and varying levels of entropy in the distribution of attack families. This bias is attributed to the agents themselves rather than their success rates in attacks. Additionally, a bias momentum effect was noted, indicating that agents tend to resist changes to their favored attack strategies.

Key facts

LLM agents in offensive cybersecurity show attack-selection bias
CyBiasBench benchmark includes 630 sessions
Evaluates five agents on three targets and four prompt conditions
Ten attack families are tested
Bias is a trait of agents, not linked to success rate
Bias momentum effect observed where agents resist change
Study published on arXiv with ID 2605.07830
Research reveals distinct attack patterns across agents

LLM Agents Show Bias in Cyber-Attack Selection, New Benchmark Reveals

Key facts

Entities

Institutions

Sources