XL-SafetyBench: Cross-Cultural LLM Safety Benchmark with 5,500 Test Cases
A new benchmark, XL-SafetyBench, evaluates LLM safety across 10 country-language pairs with 5,500 test cases. It includes a Jailbreak Benchmark of adversarial prompts and a Cultural Benchmark embedding local sensitivities in innocuous requests. Each item is built via a multi-stage pipeline with LLM-assisted discovery, automated validation, and dual native-speaker annotators per country. Two new metrics, Neutral-Safe Rate (NSR) and Cultural Sensitivity Rate (CSR), complement Attack Success Rate (ASR) to distinguish principled refusal from comprehension failure. The benchmark tests 10 frontier and 27 local LLMs.
Key facts
- XL-SafetyBench includes 5,500 test cases across 10 country-language pairs.
- It comprises a Jailbreak Benchmark and a Cultural Benchmark.
- Each item uses LLM-assisted discovery, automated validation, and dual native-speaker annotators.
- Two new metrics: Neutral-Safe Rate (NSR) and Cultural Sensitivity Rate (CSR).
- Evaluates 10 frontier and 27 local LLMs.
- Addresses English-centric bias in current LLM safety benchmarks.
- Focuses on country-specific harms and culturally embedded sensitivities.
- Published on arXiv with ID 2605.05662.
Entities
Institutions
- arXiv