Chinese LLMs tested on 21 languages including regional variants
A new study from arXiv (2504.00289v3) examines whether Chinese-developed open-weight large language models (LLMs) support languages spoken in China, comparing them to Western models. The research tests 21 language variants—including Asian regional, Chinese, and European languages—on Information Parity and reading comprehension. The study argues that language ability reveals priorities in pre-training data curation and resource allocation. Chinese developers face a tension between serving a linguistically diverse domestic population and optimizing for English-dominated global benchmarks. The investigation contrasts Chinese and Western open-weight LLMs to assess multilingual capabilities.
Key facts
- arXiv paper 2504.00289v3 compares Chinese and Western open-weight LLMs.
- Tests cover 21 language variants including Asian regional, Chinese, and European languages.
- Experiments measure Information Parity and reading comprehension.
- Study highlights tension between domestic linguistic diversity and global English benchmarks.
- Language ability provides insights into pre-training data curation and development priorities.
- Chinese models' multilingual support is compared to US and European models.
- Research examines whether Chinese models support languages spoken in China.
- Open-weight LLMs from China are assessed for regional language coverage.
Entities
Institutions
- arXiv
Locations
- China
- United States
- Europe