ARTFEED — Contemporary Art Intelligence

LLM Agents Show Bias in Cyber-Attack Selection, New Benchmark Reveals

ai-technology · 2026-05-11

A recent study has found that large language model (LLM) agents used in offensive cybersecurity show a consistent bias in their choice of attacks, concentrating on specific attack families despite changes in prompts. The researchers have developed CyBiasBench, a benchmark consisting of 630 sessions that assess five agents across three targets and four prompt conditions within ten attack families. The findings reveal clear biases, with certain attack families dominating and varying levels of entropy in the distribution of attack families. This bias is attributed to the agents themselves rather than their success rates in attacks. Additionally, a bias momentum effect was noted, indicating that agents tend to resist changes to their favored attack strategies.

Key facts

  • LLM agents in offensive cybersecurity show attack-selection bias
  • CyBiasBench benchmark includes 630 sessions
  • Evaluates five agents on three targets and four prompt conditions
  • Ten attack families are tested
  • Bias is a trait of agents, not linked to success rate
  • Bias momentum effect observed where agents resist change
  • Study published on arXiv with ID 2605.07830
  • Research reveals distinct attack patterns across agents

Entities

Institutions

  • arXiv

Sources