S2ST-Omni 2: Structured Language Conditioning for Multilingual Speech Translation
Researchers propose S2ST-Omni 2, a many-to-one compositional speech-to-speech translation (S2ST) framework that replaces flat language labels with structured typological priors. This reformulation addresses the limitation of existing S2ST systems that neglect source-language information or encode it as independent flat embeddings, which overlooks systematic linguistic structures shared across languages. The approach operates at three levels: typology-informed hierarchical language encoding, dynamically-gated language-aware adaptation, and structured conditioning for multilingual adaptation. The framework aims to improve data-efficient multilingual adaptation when supervised S2ST data are scarce. The work is published on arXiv under identifier 2605.16026.
Key facts
- S2ST-Omni 2 is a many-to-one compositional S2ST framework.
- It replaces flat language labels with structured typological priors.
- Existing S2ST systems often neglect source-language information or use flat embeddings.
- The approach operates at three levels: hierarchical encoding, dynamic gating, and structured conditioning.
- Aims to improve data-efficient multilingual adaptation with scarce supervised data.
- Published on arXiv with identifier 2605.16026.
- Addresses limitations in current SpeechLLM-based S2ST systems.
- Proposes structured language conditioning for multilingual speech translation.
Entities
Institutions
- arXiv