ARTFEED — Contemporary Art Intelligence

Survival Analysis Framework Quantifies LLM Safety Degradation Under Repeated Attacks

ai-technology · 2026-05-14

A recent study published on arXiv introduces a framework for survival analysis aimed at assessing the vulnerability of LLM jailbreaks over time, shifting away from simple success or failure metrics. This research treats the time-to-jailbreak as a survival outcome, allowing for the estimation of hazard functions, survival curves, and associated risk factors. The analysis involved three LLMs tested with a selection of HarmBench prompts across three different attack categories, uncovering unique vulnerability patterns, particularly highlighting a quick decline under iterative attacks.

Key facts

  • arXiv paper 2605.12869 proposes survival analysis for LLM safety evaluation.
  • Framework models time-to-jailbreak as a survival outcome.
  • Estimates hazard functions, survival curves, and risk factors.
  • Evaluates three LLMs on HarmBench prompts across three attack categories.
  • Models show distinct vulnerability profiles, with one degrading rapidly under iterative attacks.
  • Existing frameworks report binary success/failure metrics, missing temporal dynamics.
  • The work is preliminary and focuses on adversarial jailbreak attacks.
  • LLMs remain vulnerable to attacks that circumvent safety guardrails.

Entities

Institutions

  • arXiv
  • HarmBench

Sources