WASIL: Arabic Spoken Interaction Dataset for LLMs
A new dataset called WASIL has been introduced by researchers, comprising 8,529 Arabic spoken interaction prompts collected in real-world settings. This dataset features audio recordings, ASR hypotheses, assistant replies, and user feedback, with 14.2% being dislikes. It also contains a test set of 2,000 turns that encompasses Modern Standard Arabic and four prominent dialects. Gold transcripts were produced at a low cost through multi-ASR agreement-guided post-editing, while the turns were annotated to distinguish between intrinsic unanswerability and ASR-related errors. This initiative facilitates scalable, reference-free evaluation of LLM responses within Arabic voice assistant systems.
Key facts
- WASIL dataset contains 8,529 spoken interaction turns
- 14.2% of turns have dislike feedback
- Test set of 2,000 turns covers MSA and four dialects
- Gold transcripts via multi-ASR agreement-guided post-editing
- Answerability annotation separates unanswerability from ASR errors
- Dataset enables reference-free evaluation of LLM responses
Entities
—