ARTFEED — Contemporary Art Intelligence

Synthetic Data Scaling for Low-Resource Spoken Language Models

ai-technology · 2026-05-28

A new paper on arXiv (2605.27383) identifies the Stability-Expressivity Gap in Spoken Language Models (SLMs) for low-resource languages: synthetic data improves phonetic accuracy but suppresses prosodic variability, causing Synthetic Erosion. The authors propose Disentanglement-Guided Self-Alignment (DGSA) to recover expressivity via prosody-timbre separation. The work targets regimes where authentic data is scarce.

Key facts

  • arXiv paper ID: 2605.27383
  • Announce type: cross
  • Identifies Stability-Expressivity Gap in SLMs
  • Synthetic data causes Synthetic Erosion of expressivity
  • Proposes DGSA framework for prosody-timbre separation
  • Focuses on low-resource languages
  • Synthetic data is primary scaling strategy
  • Aims to bridge gap between stability and expressivity

Entities

Institutions

  • arXiv

Sources