AI Embeddings Fail to Capture Preferential Similarity for Collective Decision-Making
A new arXiv preprint (2605.08360) reveals a fundamental flaw in using standard text embeddings for AI-driven collective decision-making. The paper argues that while modern AI can aggregate free-form text opinions instead of fixed votes, off-the-shelf embeddings measure semantic similarity, not the preferential similarity needed for facility location and fair clustering problems. Preferential similarity requires that a participant's agreement with a text be inversely related to their distance from it. The authors formalize this as an invariance problem: embeddings encode both preference-relevant signals (stance, values) and semantic nuances, but fail when the correlation between semantic and preferential similarity breaks. The study proposes that standard embeddings inherit only a coarse preference signal through correlation, which is insufficient for accurate preference aggregation.
Key facts
- arXiv paper 2605.08360 addresses AI collective decision-making
- Standard text embeddings measure semantic similarity, not preferential similarity
- Preferential similarity requires inverse relation between agreement and distance
- Off-the-shelf embeddings fail when semantic-preferential correlation breaks
- Problem formalized as an invariance problem in embedding models
- Embeddings encode both preference-relevant and semantic signals
- Facility location and fair clustering rely on preferential similarity
- Paper proposes new approach for embedding preferences
Entities
Institutions
- arXiv