ARTFEED — Contemporary Art Intelligence

Multilingual Framework Detects Reclaimed Slurs in LGBTQ+ Discourse

ai-technology · 2026-05-14

A novel multi-stage approach has been introduced for identifying reclaimed slurs in multilingual social media contexts. This system distinguishes between reclamatory and non-reclamatory uses of LGBTQ+-related slurs in tweets written in English, Spanish, and Italian. It tackles issues such as limited data, class imbalance, and variations in sentiment across languages. The framework employs cross-validation for model selection, back-translation for semantic-preserving augmentation, dynamic epoch-level undersampling for inductive transfer learning, and masked language modeling for incorporating domain-specific knowledge. Eight multilingual embedding models were assessed, leading to the choice of XLM-RoBERTa as the foundational model based on its macro-averaged F1 score. Additionally, data augmentation using GPT-4o-mini back-translation effectively tripled the training dataset.

Key facts

  • Framework detects reclaimed slurs in multilingual social media
  • Focuses on LGBTQ+-related slurs in English, Spanish, and Italian
  • Addresses data scarcity, class imbalance, cross-linguistic variation
  • Uses cross-validation, back-translation, transfer learning, masked language modeling
  • XLM-RoBERTa selected as foundation model
  • GPT-4o-mini back-translation tripled training corpus
  • Evaluated eight multilingual embedding models
  • Published on arXiv under ID 2605.13415

Entities

Institutions

  • arXiv

Sources