IndicSafe: Benchmarking LLM Safety Across 12 Indic Languages
IndicSafe has launched the inaugural comprehensive assessment of large language model safety in 12 Indic languages, which are spoken by more than 1.2 billion individuals. Researchers evaluated 10 prominent LLMs using 6,000 culturally relevant prompts that addressed topics such as caste, religion, gender, health, and politics. The findings reveal a notable safety drift, with cross-language agreement at just 12.8% and a variance in SAFE rates exceeding 17% across different languages. Certain models tend to excessively refuse harmless prompts in low-resource scripts or overflag sensitive political issues, while others neglect to identify unsafe outputs. The research employs prompt-level entropy, category bias scores, and multilingual consistency indices to measure these shortcomings.
Key facts
- First systematic evaluation of LLM safety across 12 Indic languages
- Languages spoken by over 1.2 billion people
- Dataset of 6,000 culturally grounded prompts
- Topics include caste, religion, gender, health, and politics
- 10 leading LLMs assessed
- Cross-language agreement is just 12.8%
- SAFE rate variance exceeds 17% across languages
- Some models over-refuse or under-refuse depending on language script
Entities
Locations
- South Asia