AI Embeddings Fail to Capture Preferential Similarity for Collective Decision-Making

ai-technology · 2026-05-12

A new arXiv preprint (2605.08360) reveals a fundamental flaw in using standard text embeddings for AI-driven collective decision-making. The paper argues that while modern AI can aggregate free-form text opinions instead of fixed votes, off-the-shelf embeddings measure semantic similarity, not the preferential similarity needed for facility location and fair clustering problems. Preferential similarity requires that a participant's agreement with a text be inversely related to their distance from it. The authors formalize this as an invariance problem: embeddings encode both preference-relevant signals (stance, values) and semantic nuances, but fail when the correlation between semantic and preferential similarity breaks. The study proposes that standard embeddings inherit only a coarse preference signal through correlation, which is insufficient for accurate preference aggregation.

Key facts

arXiv paper 2605.08360 addresses AI collective decision-making
Standard text embeddings measure semantic similarity, not preferential similarity
Preferential similarity requires inverse relation between agreement and distance
Off-the-shelf embeddings fail when semantic-preferential correlation breaks
Problem formalized as an invariance problem in embedding models
Embeddings encode both preference-relevant and semantic signals
Facility location and fair clustering rely on preferential similarity
Paper proposes new approach for embedding preferences

AI Embeddings Fail to Capture Preferential Similarity for Collective Decision-Making

Key facts

Entities

Institutions

Sources