ROK-FORTRESS: New Benchmark Tests LLM Safety Across Geopolitical Contexts
A new bilingual benchmark named ROK-FORTRESS has been launched by researchers to assess the safety of large language models (LLMs) in critical National Security and Public Safety (NSPS) contexts. Focusing on the English–Korean language pair and the U.S.–ROK geopolitical relationship, the benchmark utilizes a transcreation matrix to distinguish the impacts of language from geopolitical factors. It evaluates adversarial intents through controlled combinations of English and Korean languages alongside U.S. and Korean entities, institutions, and operational specifics. Each adversarial prompt is matched with a benign counterpart for dual-use. Available on Hugging Face, this dataset fills a void in multilingual safety assessments, which often depend solely on translation-based benchmarks lacking geopolitical considerations. The research offers empirical insights into the interplay of language and geopolitical context, broadening the scope beyond previously studied language pairs.
Key facts
- ROK-FORTRESS is a bilingual NSPS benchmark for LLM safety.
- It uses the English–Korean language pair and U.S.–ROK geopolitical axis.
- A transcreation matrix separates language and geopolitical grounding effects.
- Adversarial intents are tested under controlled language and entity combinations.
- Each adversarial prompt is paired with a dual-use benign prompt.
- The dataset is publicly available on Hugging Face.
- It addresses gaps in multilingual safety evaluations that use translation-only benchmarks.
- The study provides empirical evidence of language and geopolitical context interaction.
Entities
Institutions
- Scale AI
- Hugging Face
- arXiv
Locations
- United States
- South Korea