Hindi Speech Recognition Achieves 91.79% Accuracy with CNN-Based Keyword Spotting
A study on keyword spotting (KWS) for Hindi speech recognition using a dataset of 40,000 audio samples achieves 91.79% accuracy. The system employs Convolutional Neural Networks (CNNs) with Mel Frequency Cepstral Coefficients (MFCCs) as input features. Audio samples were recorded at 44 kHz with an average duration of 1.9 seconds. The approach focuses on on-device, user-specific query recognition while maintaining computational efficiency. Various CNN architectures were tested, with the best performing model reaching the reported accuracy. The work was published on arXiv under computer science and sound categories.
Key facts
- Dataset of 40,000 Hindi audio samples used
- Sampling rate of 44 kHz
- Average audio duration of 1.9 seconds
- MFCC features extracted from raw audio
- CNN-based classification achieves 91.79% accuracy
- Focus on on-device, user-specific keyword spotting
- Multiple CNN architectures evaluated
- Published on arXiv (ID: 2605.02928)
Entities
Institutions
- arXiv