ARTFEED — Contemporary Art Intelligence

Gender Erasure in English-to-Hindi Machine Translation

ai-technology · 2026-05-28

A new study from arXiv (2605.27654) reveals that generative translation systems frequently fail to preserve gender cues when translating from English to Hindi. The researchers built a 37,345-instance benchmark spanning twelve categories and tested five systems, finding that gender is often erased through ergative and honorific constructions. To address this, they introduced two inference-time interventions: the Source-Aware Reranker (SAR), which avoids gender-neutralizing syntax, and the Phenomenon-Aware Reranker (PAR), which preserves gender via targeted lexical marking even when ergative syntax remains. PAR improved accuracy on target subsets for GPT-4o-mini and Sarvam models. The work underscores translation as a cultural technology where socially meaningful cues must be faithfully rendered within grammatical systems.

Key facts

  • Study published on arXiv with ID 2605.27654
  • Benchmark contains 37,345 instances across twelve categories
  • Five generative translation systems tested
  • Gender erasure occurs through ergative and honorific constructions
  • SAR intervention prefers candidates avoiding gender-neutralizing syntax
  • PAR intervention preserves gender via lexical marking
  • PAR tested on GPT-4o-mini and Sarvam models
  • Translation framed as a cultural technology

Entities

Institutions

  • arXiv

Sources