Gender Erasure in English-to-Hindi Machine Translation
A new study from arXiv (2605.27654) reveals that generative translation systems frequently fail to preserve gender cues when translating from English to Hindi. The researchers built a 37,345-instance benchmark spanning twelve categories and tested five systems, finding that gender is often erased through ergative and honorific constructions. To address this, they introduced two inference-time interventions: the Source-Aware Reranker (SAR), which avoids gender-neutralizing syntax, and the Phenomenon-Aware Reranker (PAR), which preserves gender via targeted lexical marking even when ergative syntax remains. PAR improved accuracy on target subsets for GPT-4o-mini and Sarvam models. The work underscores translation as a cultural technology where socially meaningful cues must be faithfully rendered within grammatical systems.
Key facts
- Study published on arXiv with ID 2605.27654
- Benchmark contains 37,345 instances across twelve categories
- Five generative translation systems tested
- Gender erasure occurs through ergative and honorific constructions
- SAR intervention prefers candidates avoiding gender-neutralizing syntax
- PAR intervention preserves gender via lexical marking
- PAR tested on GPT-4o-mini and Sarvam models
- Translation framed as a cultural technology
Entities
Institutions
- arXiv