LLM Decoders Don't Amplify Racial Bias in Speech Recognition, Study Finds

ai-technology · 2026-04-25

A recent study released on arXiv (2604.21276) investigates the potential for large language model (LLM) decoders in speech recognition to either introduce or exacerbate demographic bias. The researchers evaluated nine models spanning three architectural types: CTC (lacking a language model), encoder-decoder (with implicit LM), and LLM-based (featuring an explicit pretrained decoder). They analyzed approximately 43,000 utterances from the Common Voice 24 and Meta's Fair-Speech dataset, which mitigates vocabulary confounds. The research focused on five demographic factors: ethnicity, accent, gender, age, and first language. Notable results include: LLM decoders did not heighten racial bias (Granite-8B showed the best ethnicity fairness with a max/min WER of 2.28); Whisper exhibited severe hallucination issues with Indian-accented speech, peaking at a 9.62% insertion rate at large-v3; and audio compression was linked to accent fairness. This study questions existing beliefs about bias from LLMs in speech recognition.

Key facts

Study evaluates nine models across CTC, encoder-decoder, and LLM-based architectures
Uses about 43,000 utterances from Common Voice 24 and Meta's Fair-Speech dataset
Examines five demographic axes: ethnicity, accent, gender, age, first language
Granite-8B has best ethnicity fairness with max/min WER = 2.28
Whisper exhibits pathological hallucination on Indian-accented speech
Whisper large-v3 shows non-monotonic insertion-rate spike to 9.62%
Audio compression predicts accent fairness
LLM decoders do not amplify racial bias on clean audio

LLM Decoders Don't Amplify Racial Bias in Speech Recognition, Study Finds

Key facts

Entities

Institutions

Sources