Shared Confidence Features Across Languages in LLMs

ai-technology · 2026-06-01

A recent study published on arXiv (2605.31220) examines if multilingual large language models can encode confidence features that are transferable across languages. Researchers utilized a simple linear probe trained solely on English, discovering it can accurately predict answer correctness in a zero-shot manner across various typologically distinct languages without needing supervision in the target language. The findings indicate that confidence features are primarily located in the middle layers of the model, implying a common confidence subspace. While performance is influenced by the similarity to the source language, this method circumvents the need for retraining for each individual language, addressing a significant gap in confidence estimation research, which has predominantly centered on English despite the use of multilingual LLMs.

Key facts

Study from arXiv 2605.31220
Focuses on zero-shot cross-lingual confidence estimation
Uses a lightweight linear probe trained on English
Probe generalizes to unseen languages without target-language supervision
Confidence features concentrate in middle layers across languages
Performance depends on similarity to source language
Addresses lack of multilingual confidence estimation research

Shared Confidence Features Across Languages in LLMs

Key facts

Entities

Institutions

Sources