Synthetic Data Scaling for Low-Resource Spoken Language Models

ai-technology · 2026-05-28

A new paper on arXiv (2605.27383) identifies the Stability-Expressivity Gap in Spoken Language Models (SLMs) for low-resource languages: synthetic data improves phonetic accuracy but suppresses prosodic variability, causing Synthetic Erosion. The authors propose Disentanglement-Guided Self-Alignment (DGSA) to recover expressivity via prosody-timbre separation. The work targets regimes where authentic data is scarce.

Key facts

arXiv paper ID: 2605.27383
Announce type: cross
Identifies Stability-Expressivity Gap in SLMs
Synthetic data causes Synthetic Erosion of expressivity
Proposes DGSA framework for prosody-timbre separation
Focuses on low-resource languages
Synthetic data is primary scaling strategy
Aims to bridge gap between stability and expressivity

Synthetic Data Scaling for Low-Resource Spoken Language Models

Key facts

Entities

Institutions

Sources