S2ST-Omni 2: Structured Language Conditioning for Multilingual Speech Translation

ai-technology · 2026-05-18

Researchers propose S2ST-Omni 2, a many-to-one compositional speech-to-speech translation (S2ST) framework that replaces flat language labels with structured typological priors. This reformulation addresses the limitation of existing S2ST systems that neglect source-language information or encode it as independent flat embeddings, which overlooks systematic linguistic structures shared across languages. The approach operates at three levels: typology-informed hierarchical language encoding, dynamically-gated language-aware adaptation, and structured conditioning for multilingual adaptation. The framework aims to improve data-efficient multilingual adaptation when supervised S2ST data are scarce. The work is published on arXiv under identifier 2605.16026.

Key facts

S2ST-Omni 2 is a many-to-one compositional S2ST framework.
It replaces flat language labels with structured typological priors.
Existing S2ST systems often neglect source-language information or use flat embeddings.
The approach operates at three levels: hierarchical encoding, dynamic gating, and structured conditioning.
Aims to improve data-efficient multilingual adaptation with scarce supervised data.
Published on arXiv with identifier 2605.16026.
Addresses limitations in current SpeechLLM-based S2ST systems.
Proposes structured language conditioning for multilingual speech translation.

S2ST-Omni 2: Structured Language Conditioning for Multilingual Speech Translation

Key facts

Entities

Institutions

Sources