ARTFEED — Contemporary Art Intelligence

XLGoBench: New Benchmark Detects Cross-Lingual Gaps in LLMs

ai-technology · 2026-06-01

Researchers have introduced XLGoBench, a benchmark of synthetic algorithmic tasks designed to detect cross-lingual skill gaps in large language models (LLMs). The benchmark is commensurate across languages, requiring models to perform the same underlying task in different languages. It is scalable, with tasks generated at varying complexity levels; quantifiable, with objective correctness; and transparent, as tasks come from simple templates auditable for translation errors. Experiments show that XLGoBench exposes persistent cross-lingual gaps in multiple state-of-the-art models. Differential performance is a sufficient but not necessary indicator of such gaps.

Key facts

  • XLGoBench is a set of synthetic algorithmic tasks.
  • It detects cross-lingual gaps in LLM abilities.
  • Tasks are commensurate across languages.
  • Benchmark is scalable, quantifiable, and transparent.
  • Experiments reveal persistent cross-lingual gaps in state-of-the-art models.
  • Differential performance is a sufficient but not necessary indicator.

Entities

Institutions

  • arXiv

Sources