ARTFEED — Contemporary Art Intelligence

Unsupervised Neural Networks Spontaneously Concatenate Speech

ai-technology · 2026-04-25

A recent study published on arXiv suggests that the fundamental process of syntax evolution—concatenation—can be directly modeled from unprocessed speech through unsupervised deep neural networks. The researchers utilized ciwGAN/fiwGAN models, which are based on convolutional neural networks, to train on acoustic recordings of single words. Remarkably, without any access to multi-word datasets, these models began producing outputs that included two or three words combined, a phenomenon referred to as "spontaneous concatenation." This result was consistently observed across multiple independently trained models with varying hyperparameters and datasets. Additionally, networks trained on pairs of words demonstrated the ability to create new, unseen combinations, indicating early signs of compositionality. This research questions the traditional text-centric focus of computational syntax models.

Key facts

  • arXiv:2305.01626v4
  • Announce Type: replace-cross
  • Focus on concatenation as a basic suboperation of syntax
  • Spontaneous concatenation observed in ciwGAN/fiwGAN models
  • Models trained on single words only
  • Outputs with two or three concatenated words emerged
  • Replicated in multiple models with different hyperparameters
  • Novel word combinations formed from two-word training
  • Precursors to compositionality detected in outputs

Entities

Institutions

  • arXiv

Sources