ARTFEED — Contemporary Art Intelligence

S2ST-Omni 2: Structured Language Conditioning for Multilingual Speech Translation

ai-technology · 2026-05-18

Researchers propose S2ST-Omni 2, a many-to-one compositional speech-to-speech translation (S2ST) framework that replaces flat language labels with structured typological priors. This reformulation addresses the limitation of existing S2ST systems that neglect source-language information or encode it as independent flat embeddings, which overlooks systematic linguistic structures shared across languages. The approach operates at three levels: typology-informed hierarchical language encoding, dynamically-gated language-aware adaptation, and structured conditioning for multilingual adaptation. The framework aims to improve data-efficient multilingual adaptation when supervised S2ST data are scarce. The work is published on arXiv under identifier 2605.16026.

Key facts

  • S2ST-Omni 2 is a many-to-one compositional S2ST framework.
  • It replaces flat language labels with structured typological priors.
  • Existing S2ST systems often neglect source-language information or use flat embeddings.
  • The approach operates at three levels: hierarchical encoding, dynamic gating, and structured conditioning.
  • Aims to improve data-efficient multilingual adaptation with scarce supervised data.
  • Published on arXiv with identifier 2605.16026.
  • Addresses limitations in current SpeechLLM-based S2ST systems.
  • Proposes structured language conditioning for multilingual speech translation.

Entities

Institutions

  • arXiv

Sources