ARTFEED — Contemporary Art Intelligence

New Benchmark Evaluates Commercial ASR on Code-Switching Speech

ai-technology · 2026-05-20

A new research paper presents a benchmark aimed at assessing commercial automatic speech recognition (ASR) systems specifically for code-switching speech. It encompasses four language combinations: Egyptian Arabic–English, Saudi Arabic (Najdi/Hijazi)–English, Persian (Farsi)–English, and German–English. Each dataset comprises 300 samples, curated through a two-step process: initially, a heuristic filter evaluates transcripts based on five structural code-switching indicators, followed by an ensemble of GPT-4o and Gemini 1.5 Pro that assesses candidates across six linguistic dimensions. This approach significantly cuts LLM scoring expenses by around 91% compared to comprehensive scoring methods. The research focuses on the often-overlooked phenomenon of code-switching, where speakers switch languages within a single utterance, and critiques existing benchmarks that only assess clean, monolingual audio with a single Word Error Rate (WER) metric.

Key facts

  • Benchmark evaluates five commercial ASR providers.
  • Covers four language pairs: Egyptian Arabic–English, Saudi Arabic–English, Persian–English, German–English.
  • Each dataset has 300 samples.
  • Two-stage pipeline: heuristic filter then LLM ensemble (GPT-4o and Gemini 1.5 Pro).
  • Pipeline reduces LLM scoring costs by ~91%.
  • Code-switching is alternation between two languages in one utterance.
  • Existing benchmarks use clean, monolingual audio and single WER.
  • Published on arXiv with ID 2605.19069.

Entities

Institutions

  • arXiv

Sources