ARTFEED — Contemporary Art Intelligence

SCRIBE: Diagnostic Framework for Indic ASR Error Analysis

publication · 2026-05-22

A new diagnostic framework called SCRIBE has been developed by researchers for automatic speech recognition (ASR). This framework categorizes errors into four distinct types: lexical, punctuation, numeral, and domain-entity. To overcome the shortcomings of word error rate (WER), which tends to merge error types and unfairly affects agglutinative languages such as Hindi, Malayalam, and Kannada, SCRIBE employs sandhi-tolerant alignment and incorporates domain-specific vocabulary. Validation by human experts indicates that SCRIBE’s assessments are more aligned with expert opinions compared to WER. The release features a curation pipeline for LLM, benchmarks, and open-weight transcription models for the three languages. This research is available on arXiv in the fields of computer science and language computation.

Key facts

  • SCRIBE provides categorical error decomposition for ASR.
  • Error categories include lexical, punctuation, numeral, and domain-entity rates.
  • Sandhi-tolerant alignment addresses agglutinative language issues.
  • Domain vocabulary injection improves domain-specific recognition.
  • Human validation confirms SCRIBE aligns with expert judgment.
  • WER fails by collapsing error types and penalizing agglutinative languages.
  • Open-weight rich transcription models released for Hindi, Malayalam, and Kannada.
  • SCRIBE includes an LLM curation pipeline and benchmarks.

Entities

Institutions

  • arXiv

Sources