Language Generalization Framework for Low-Resource Varieties
A new two-stage Language Generalization framework addresses the neglect of low-resource language varieties in Multilingual Language Models. Unlike prior cross-lingual research focusing on aligning allied varieties, this approach leverages linguistic dissimilarity as a cue for generalization to unseen varieties. The framework includes TOPPing, a source-selection method for low-resource varieties, and VACAI-Bowl, a lightweight architecture that learns variety-specific attributes via one branch and variety-invariant attributes via adversarial training. The work is published on arXiv with ID 2605.04500.
Key facts
- Low-resource language varieties are neglected in Multilingual Language Models.
- Cross-lingual research typically minimizes differences between allied varieties.
- Linguistic dissimilarity is used as a cue for generalization to unseen varieties.
- The framework has two stages: TOPPing source-selection and VACAI-Bowl architecture.
- VACAI-Bowl learns variety-specific and variety-invariant attributes.
- Adversarial training is employed for variety-invariant attributes.
- The paper is available on arXiv with ID 2605.04500.
- The approach is designed specifically for low-resource varieties.
Entities
Institutions
- arXiv