Survival Analysis Framework Quantifies LLM Safety Degradation Under Repeated Attacks

ai-technology · 2026-05-14

A recent study published on arXiv introduces a framework for survival analysis aimed at assessing the vulnerability of LLM jailbreaks over time, shifting away from simple success or failure metrics. This research treats the time-to-jailbreak as a survival outcome, allowing for the estimation of hazard functions, survival curves, and associated risk factors. The analysis involved three LLMs tested with a selection of HarmBench prompts across three different attack categories, uncovering unique vulnerability patterns, particularly highlighting a quick decline under iterative attacks.

Key facts

arXiv paper 2605.12869 proposes survival analysis for LLM safety evaluation.
Framework models time-to-jailbreak as a survival outcome.
Estimates hazard functions, survival curves, and risk factors.
Evaluates three LLMs on HarmBench prompts across three attack categories.
Models show distinct vulnerability profiles, with one degrading rapidly under iterative attacks.
Existing frameworks report binary success/failure metrics, missing temporal dynamics.
The work is preliminary and focuses on adversarial jailbreak attacks.
LLMs remain vulnerable to attacks that circumvent safety guardrails.

Survival Analysis Framework Quantifies LLM Safety Degradation Under Repeated Attacks

Key facts

Entities

Institutions

Sources