SCRIBE: Diagnostic Framework for Indic ASR Error Analysis
A new diagnostic framework called SCRIBE has been developed by researchers for automatic speech recognition (ASR). This framework categorizes errors into four distinct types: lexical, punctuation, numeral, and domain-entity. To overcome the shortcomings of word error rate (WER), which tends to merge error types and unfairly affects agglutinative languages such as Hindi, Malayalam, and Kannada, SCRIBE employs sandhi-tolerant alignment and incorporates domain-specific vocabulary. Validation by human experts indicates that SCRIBE’s assessments are more aligned with expert opinions compared to WER. The release features a curation pipeline for LLM, benchmarks, and open-weight transcription models for the three languages. This research is available on arXiv in the fields of computer science and language computation.
Key facts
- SCRIBE provides categorical error decomposition for ASR.
- Error categories include lexical, punctuation, numeral, and domain-entity rates.
- Sandhi-tolerant alignment addresses agglutinative language issues.
- Domain vocabulary injection improves domain-specific recognition.
- Human validation confirms SCRIBE aligns with expert judgment.
- WER fails by collapsing error types and penalizing agglutinative languages.
- Open-weight rich transcription models released for Hindi, Malayalam, and Kannada.
- SCRIBE includes an LLM curation pipeline and benchmarks.
Entities
Institutions
- arXiv