AI Chatbots Provide Inaccurate Medical Information Despite Authoritative Tone, Study Reveals

ai-technology · 2026-04-21

Researchers from the University of Tübingen conducted a thorough assessment of five AI chatbots—ChatGPT, Gemini, Grok, Meta AI, and DeepSeek—and found notable inaccuracies in their health-related answers. Out of 50 medical inquiries, approximately 20% of the responses were categorized as highly problematic, with half deemed problematic and 30% somewhat problematic. Grok led with 58% of its responses being problematic, while ChatGPT and Meta AI followed at 52% and 50%, respectively. The chatbots particularly struggled with questions regarding nutrition and athletic performance, facing challenges with open-ended queries, which had a 32% high problem rating. Published in BMJ Open, the study indicated a median completeness score of 40% for scientific references, urging users to independently verify health information.

Key facts

Five AI chatbots were tested: ChatGPT, Gemini, Grok, Meta AI, and DeepSeek
Researchers asked 50 health questions across five medical domains
Two experts independently rated all answers
Nearly 20% of answers were highly problematic, 50% problematic, 30% somewhat problematic
Only two questions out of 250 were refused by the chatbots
Grok performed worst with 58% problematic responses
Chatbots achieved median reference completeness score of just 40%
Study published in BMJ Open using February 2025 free versions

Entities

Artists

Carsten Eickhoff

Institutions

University of Tübingen
BMJ Open
Nature Medicine
Jama Network Open
Nature Communications Medicine
The Conversation

Locations

Tübingen
Germany

Sources

Naked Capitalism — 2026-04-21