Sparse MERIT: Multi-Task Learning for Speech Enhancement and Emotion Recognition
A new multi-task learning framework called Sparse Mixture-of-Experts Representation Integration Technique (Sparse MERIT) has been proposed to jointly optimize speech enhancement (SE) and speech emotion recognition (SER). The approach addresses performance degradation of SER under noisy conditions and artifacts introduced by SE. Sparse MERIT uses frame-wise expert routing over self-supervised speech representations, with task-specific gating networks selecting from a shared pool of experts. This enables parameter-efficient and flexible integration, mitigating gradient interference and representational conflicts common in conventional shared-backbone models. The method is detailed in a paper on arXiv (2509.08470).
Key facts
- Sparse MERIT is a multi-task learning framework for SE and SER.
- It uses frame-wise expert routing over self-supervised speech representations.
- Task-specific gating networks dynamically select experts from a shared pool.
- It addresses gradient interference and representational conflicts.
- The approach is parameter-efficient and flexible.
- SER performance degrades under noisy conditions.
- SE can introduce artifacts that obscure emotional cues.
- The paper is available on arXiv with ID 2509.08470.
Entities
Institutions
- arXiv