LLMs Show Positional Bias in Semantic Sensitivity Testing Framework
An experimental framework capable of scaling has been developed to methodically assess the sensitivity of LLMs to minor semantic variations in document comparisons, likened to finding a needle in a haystack. Researchers incorporated single sentences with semantic alterations within a broader context across tens of thousands of document pairs. They evaluated five LLMs while experimenting with different types of perturbations, such as negations, conjunction swaps, and changes to named entities. The context types included original content compared to topically irrelevant material, varying the position of the needle and the length of documents. The findings indicated that LLMs demonstrate a positional bias within documents, differing from previously observed candidate-order effects, with most models imposing stricter penalties on earlier semantic differences. When altered sentences were placed within unrelated contexts, similarity scores consistently dropped. This framework is outlined in arXiv preprint 2604.18835v1, recognized as a cross-type publication, offering a multifaceted perspective on how LLMs handle subtle semantic changes in text comparison tasks.
Key facts
- Scalable experimental framework tests LLM sensitivity to semantic changes
- Analogized as needle-in-a-haystack problem with single altered sentences
- Five LLMs tested on tens of thousands of document pairs
- Varied perturbation types: negation, conjunction swaps, named entity replacements
- Context types: original vs. topically unrelated material
- LLMs show within-document positional bias penalizing early differences more harshly
- Topically unrelated context systematically lowers similarity scores
- arXiv preprint 2604.18835v1 announced as cross-type publication
Entities
—