CarryOnBench: Testing LLM Utility Recovery After Safety Refusals

ai-technology · 2026-05-01

CarryOnBench has been launched by researchers as the inaugural interactive benchmark designed to evaluate if large language models (LLMs) can regain utility when well-meaning users clarify their intentions after an initial safety denial. The study began with 398 queries that appeared harmful but had benign motives, leading to the simulation of 5,970 dialogues by altering user follow-up sequences. Fourteen models were assessed based on intent-aligned utility and safety, resulting in 1,866 unique conversation flows ranging from 4 to 12 turns, which produced 23,880 responses. A novel metric, Ben-Util, employs atomic checklist items to measure how effectively each response meets the user's benign information requirements. Initially, models satisfy only 10.5–37.6% of these needs, revealing a disparity between safety and helpfulness in multi-turn exchanges.

Key facts

CarryOnBench is the first interactive benchmark for LLM utility recovery after safety refusals.
It starts from 398 seemingly harmful queries with benign underlying intents.
5,970 conversations were simulated by varying user follow-up sequences.
14 models were evaluated on both intent-aligned utility and safety.
The benchmark includes 1,866 different conversation flows of 4–12 turns.
A total of 23,880 model responses were generated.
Ben-Util is a checklist-based metric for evaluating fulfillment of benign information needs.
At turn one, models fulfill only 10.5–37.6% of the user's benign information need.

CarryOnBench: Testing LLM Utility Recovery After Safety Refusals

Key facts

Entities

Institutions

Sources