Geopolitical bias in LLMs originates in post-training, not pre-training
A recent study published on arXiv (2605.23825) examined seven pairs of open-weight LLMs—base models (pre-training only) and chat models (pre-training plus post-training)—from seven different laboratories. Using a paired-scenario forced-choice probe, the analysis covered 28 country pairs in English, French, and Chinese. Findings indicate that geopolitical bias arises during post-training, not pre-training. In six out of seven AI labs, post-training altered model preferences to favor the developer's country or region. Notably, Alibaba's Qwen 2.5 exhibited the most significant change: the base model was neutral regarding China-favorability (-0.15 log-odds, p=0.15), while the chat version surged to +2.91 (p<10^-4), an 18-fold increase in odds. Other models also displayed bias shifts depending on the prompt's language, challenging the belief that bias is solely a result of pre-training data.
Key facts
- Geopolitical bias in LLMs originates in post-training, not pre-training.
- Seven open-weight LLM pairs from seven labs were tested.
- Probe used 28 country pairs in English, French, and Chinese.
- Six of seven labs showed bias shifts toward the developer's country after post-training.
- Alibaba's Qwen 2.5 showed the strongest shift: from -0.15 to +2.91 log-odds.
- Shift magnitude depends on the language of the prompt.
- Study published on arXiv with ID 2605.23825.
Entities
Institutions
- Alibaba
- Qwen