WASIL: Arabic Spoken Interaction Dataset for LLMs

ai-technology · 2026-05-20

A new dataset called WASIL has been introduced by researchers, comprising 8,529 Arabic spoken interaction prompts collected in real-world settings. This dataset features audio recordings, ASR hypotheses, assistant replies, and user feedback, with 14.2% being dislikes. It also contains a test set of 2,000 turns that encompasses Modern Standard Arabic and four prominent dialects. Gold transcripts were produced at a low cost through multi-ASR agreement-guided post-editing, while the turns were annotated to distinguish between intrinsic unanswerability and ASR-related errors. This initiative facilitates scalable, reference-free evaluation of LLM responses within Arabic voice assistant systems.

Key facts

WASIL dataset contains 8,529 spoken interaction turns
14.2% of turns have dislike feedback
Test set of 2,000 turns covers MSA and four dialects
Gold transcripts via multi-ASR agreement-guided post-editing
Answerability annotation separates unanswerability from ASR errors
Dataset enables reference-free evaluation of LLM responses

Entities

—

Sources

arXiv cs.AI — 2026-05-19