ARTFEED — Contemporary Art Intelligence

AI Chatbots Provide Inaccurate Medical Information Despite Authoritative Tone, Study Reveals

ai-technology · 2026-04-21

Researchers from the University of Tübingen conducted a thorough assessment of five AI chatbots—ChatGPT, Gemini, Grok, Meta AI, and DeepSeek—and found notable inaccuracies in their health-related answers. Out of 50 medical inquiries, approximately 20% of the responses were categorized as highly problematic, with half deemed problematic and 30% somewhat problematic. Grok led with 58% of its responses being problematic, while ChatGPT and Meta AI followed at 52% and 50%, respectively. The chatbots particularly struggled with questions regarding nutrition and athletic performance, facing challenges with open-ended queries, which had a 32% high problem rating. Published in BMJ Open, the study indicated a median completeness score of 40% for scientific references, urging users to independently verify health information.

Key facts

  • Five AI chatbots were tested: ChatGPT, Gemini, Grok, Meta AI, and DeepSeek
  • Researchers asked 50 health questions across five medical domains
  • Two experts independently rated all answers
  • Nearly 20% of answers were highly problematic, 50% problematic, 30% somewhat problematic
  • Only two questions out of 250 were refused by the chatbots
  • Grok performed worst with 58% problematic responses
  • Chatbots achieved median reference completeness score of just 40%
  • Study published in BMJ Open using February 2025 free versions

Entities

Artists

  • Carsten Eickhoff

Institutions

  • University of Tübingen
  • BMJ Open
  • Nature Medicine
  • Jama Network Open
  • Nature Communications Medicine
  • The Conversation

Locations

  • Tübingen
  • Germany

Sources