AI Shopping Recommendations Vary Wildly With Slight Phrasing Changes

ai-technology · 2026-05-28

A recent study indicates that AI assistants offer significantly varied brand suggestions when users alter their buying intent phrasing. Researchers conducted approximately 6,000 paraphrase tests alongside around 6,000 controls with the same prompts on models from OpenAI and Anthropic. The Jaccard similarity score for recommendations derived from two paraphrases of identical intent was merely 0.288 for cosmetic changes and 0.135 for rewordings that added constraints, which is considerably lower than the baseline of 0.50–0.61 observed in same-prompt reruns. The findings suggest that the specific wording of prompts, rather than the actual buyer intent, primarily influences the brands presented. Moreover, increasing reasoning efforts did not bridge this disparity, raising concerns about the effectiveness of AEO/GEO (Answer Engine Optimization / Generative Engine Optimization).

Key facts

Study tested ~6,000 paraphrase runs and ~6,000 same-prompt rerun controls
Models used: OpenAI and Anthropic
Jaccard similarity for cosmetic rewordings: 0.288 (95% CI [0.215, 0.361])
Jaccard similarity for constraint-adding rewordings: 0.135 (95% CI [0.098, 0.175])
Same-prompt rerun baseline: 0.50–0.61
Increasing reasoning effort bounded by +/-0.05
Prompt string dominates brand selection over buyer intent
Challenges AEO/GEO practices

AI Shopping Recommendations Vary Wildly With Slight Phrasing Changes

Key facts

Entities

Institutions

Sources