New Benchmark Evaluates Commercial ASR on Code-Switching Speech

ai-technology · 2026-05-20

A new research paper presents a benchmark aimed at assessing commercial automatic speech recognition (ASR) systems specifically for code-switching speech. It encompasses four language combinations: Egyptian Arabic–English, Saudi Arabic (Najdi/Hijazi)–English, Persian (Farsi)–English, and German–English. Each dataset comprises 300 samples, curated through a two-step process: initially, a heuristic filter evaluates transcripts based on five structural code-switching indicators, followed by an ensemble of GPT-4o and Gemini 1.5 Pro that assesses candidates across six linguistic dimensions. This approach significantly cuts LLM scoring expenses by around 91% compared to comprehensive scoring methods. The research focuses on the often-overlooked phenomenon of code-switching, where speakers switch languages within a single utterance, and critiques existing benchmarks that only assess clean, monolingual audio with a single Word Error Rate (WER) metric.

Key facts

Benchmark evaluates five commercial ASR providers.
Covers four language pairs: Egyptian Arabic–English, Saudi Arabic–English, Persian–English, German–English.
Each dataset has 300 samples.
Two-stage pipeline: heuristic filter then LLM ensemble (GPT-4o and Gemini 1.5 Pro).
Pipeline reduces LLM scoring costs by ~91%.
Code-switching is alternation between two languages in one utterance.
Existing benchmarks use clean, monolingual audio and single WER.
Published on arXiv with ID 2605.19069.

New Benchmark Evaluates Commercial ASR on Code-Switching Speech

Key facts

Entities

Institutions

Sources