T Rosenbluth, NY Times 2/9/26: Health Advice From A.I. Chatbots Is Frequently Wrong, Study Shows
An excerpt:
A new study published Monday provided a sobering look at whether A.I. chatbots, which have fast become a major source of health information…
The experiment found that the chatbots were no better than Google — already a flawed source of health information — at guiding users toward the correct diagnoses or helping them determine what they should do next. And the technology posed unique risks, sometimes presenting false information or dramatically changing its advice depending on slight changes in the wording of the questions…
The models have passed medical licensing exams and have outperformed doctors on challenging diagnostic problems.
But Adam Mahdi, a professor at the Oxford Internet Institute and senior author of the new Nature Medicine study, suspected that these clean, straightforward medical questions were not a good proxy for how well they worked for real patients…
So he and his colleagues set up an experiment. More than 1,200 British participants, most of whom had no medical training, were given a detailed medical scenario, complete with symptoms, general lifestyle details and medical history. The researchers told the participants to chat with the bot to figure out the appropriate next steps, like whether to call an ambulance or self-treat at home. They tested commercially available chatbots like OpenAI’s ChatGPT and Meta’s Llama.
The researchers found that participants chose the “right” course of action — predetermined by a panel of doctors — less than half of the time…They were no better than the control group, who were told to perform the same task using any research method they would normally use at home, mainly Googling…
Participants didn’t enter enough information or the most relevant symptoms, and the chatbots were left to give advice with an incomplete picture of the problem…By contrast, when researchers entered the full medical scenario directly into the chatbots, they correctly diagnosed the problem 94 percent of the time…
Even when researchers typed in the medical scenario directly, they found that the chatbots struggled to correctly distinguish when a set of symptoms warranted immediate medical attention or non-urgent care.
My take: AI and chatbots can be quite helpful and continue to improve. This study and the summary by NY Times show some of the limitations. Even small changes in wording/prompts can alter the advice from chatbots considerably.
Related blog posts:
- Dr. Jennifer Lee: AI for Peds GI
- AI for GI
- Artificial Intelligence in the Endoscopy Suite
- The Future of Medicine: AI’s Role vs Human Judgment
- ChatGPT4 Outperforms GI Docs for Postcolonoscopy Surveillance Advice
- Answering Patient Questions: AI Does Better Than Doctors








































