Synthetic Data Scaling for Low-Resource Spoken Language Models
A new paper on arXiv (2605.27383) identifies the Stability-Expressivity Gap in Spoken Language Models (SLMs) for low-resource languages: synthetic data improves phonetic accuracy but suppresses prosodic variability, causing Synthetic Erosion. The authors propose Disentanglement-Guided Self-Alignment (DGSA) to recover expressivity via prosody-timbre separation. The work targets regimes where authentic data is scarce.
Key facts
- arXiv paper ID: 2605.27383
- Announce type: cross
- Identifies Stability-Expressivity Gap in SLMs
- Synthetic data causes Synthetic Erosion of expressivity
- Proposes DGSA framework for prosody-timbre separation
- Focuses on low-resource languages
- Synthetic data is primary scaling strategy
- Aims to bridge gap between stability and expressivity
Entities
Institutions
- arXiv