Generative Meta-Continual Learning Enables 1000-Class Few-Shot Spoken Word Classification
A recent study available on arXiv (2605.13075) reveals that a spoken word classifier can effectively learn to differentiate among 1000 categories with just five examples per category. Utilizing the Generative Meta-Continual Learning (GeMCL) algorithm, the researchers evaluated their model against baselines that underwent repeated training or fine-tuning. The results showed that GeMCL achieved remarkably consistent performance, rivaling that of a static HuBERT model paired with a classifier head that was repeatedly trained, while adapting 2000 times more swiftly and requiring less than half the data. This research underscores the potential for expanding few-shot spoken word classification to encompass larger sets of classes.
Key facts
- arXiv paper 2605.13075
- 1000 classes with five shots per class
- Generative Meta-Continual Learning (GeMCL) algorithm used
- Compared to repeatedly trained or finetuned baselines
- GeMCL produces stable performance
- 2000 times faster adaptation than frozen HuBERT with trained classifier head
- Trained on less than half the data
- Scaling capability for few-shot spoken word classification
Entities
Institutions
- arXiv