WARDEN: AI System Transcribes and Translates Endangered Wardaman Language
A team of researchers has created WARDEN, an innovative language model system designed to transcribe and translate Wardaman, an endangered Indigenous language from Australia, into English. This system tackles the significant challenge posed by the limited availability of only 6 hours of annotated training data. In contrast to traditional models that utilize a single framework for both tasks, WARDEN incorporates distinct models: it first transforms Wardaman audio into phonemic transcription and subsequently translates that transcription into English. To improve its effectiveness, the transcription model draws upon Sundanese, a language that shares similar phonemes with Wardaman. This method showcases a promising strategy for the preservation of endangered languages with scarce resources.
Key facts
- WARDEN is a language model for transcribing and translating Wardaman to English.
- Only 6 hours of annotated audio data are available for training.
- The system uses separate models for transcription and translation.
- Transcription converts audio to phonemic transcription.
- Translation converts phonemic transcription to English.
- Wardaman token initialization uses Sundanese due to phonemic similarity.
- The research is presented in arXiv paper 2605.13846.
- Wardaman is an endangered Australian Indigenous language.
Entities
Institutions
- arXiv
Locations
- Australia