Health Advice From AI Chatbots Frequently Wrong

T Rosenbluth, NY Times 2/9/26: Health Advice From A.I. Chatbots Is Frequently Wrong, Study Shows

An excerpt:

new study published Monday provided a sobering look at whether A.I. chatbots, which have fast become a major source of health information…

The experiment found that the chatbots were no better than Google — already a flawed source of health information — at guiding users toward the correct diagnoses or helping them determine what they should do next. And the technology posed unique risks, sometimes presenting false information or dramatically changing its advice depending on slight changes in the wording of the questions…

The models have passed medical licensing exams and have outperformed doctors on challenging diagnostic problems.

But Adam Mahdi, a professor at the Oxford Internet Institute and senior author of the new Nature Medicine study, suspected that these clean, straightforward medical questions were not a good proxy for how well they worked for real patients…

So he and his colleagues set up an experiment. More than 1,200 British participants, most of whom had no medical training, were given a detailed medical scenario, complete with symptoms, general lifestyle details and medical history. The researchers told the participants to chat with the bot to figure out the appropriate next steps, like whether to call an ambulance or self-treat at home. They tested commercially available chatbots like OpenAI’s ChatGPT and Meta’s Llama.

The researchers found that participants chose the “right” course of action — predetermined by a panel of doctors — less than half of the time…They were no better than the control group, who were told to perform the same task using any research method they would normally use at home, mainly Googling…

Participants didn’t enter enough information or the most relevant symptoms, and the chatbots were left to give advice with an incomplete picture of the problem…By contrast, when researchers entered the full medical scenario directly into the chatbots, they correctly diagnosed the problem 94 percent of the time…

Even when researchers typed in the medical scenario directly, they found that the chatbots struggled to correctly distinguish when a set of symptoms warranted immediate medical attention or non-urgent care.

My take: AI and chatbots can be quite helpful and continue to improve. This study and the summary by NY Times show some of the limitations. Even small changes in wording/prompts can alter the advice from chatbots considerably.

Related blog posts:

Iguazu Falls

Health Disinformation Risks from AI Chatbots

MD Modi et al. Annals of Internal Medicine; 2025. https://doi.org/10.7326/ANNALS-24-0393. Abstract: Assessing the System-Instruction Vulnerabilities of Large Language Models to Malicious Conversion Into Health Disinformation Chatbots

Methods: This study assessed the effectiveness of safeguards in foundational LLMs against malicious instruction into health disinformation chatbots. Five foundational LLMs—OpenAI’s GPT-4o, Google’s Gemini 1.5 Pro, Anthropic’s Claude 3.5 Sonnet, Meta’s Llama 3.2-90B Vision, and xAI’s Grok Beta—were evaluated via their application programming interfaces (APIs). Each API received system-level instructions to produce incorrect responses to health queries, delivered in a formal, authoritative, convincing, and scientific tone.

Key findings:

  • Of the 100 health queries posed across the 5 customized LLM API chatbots, 88 (88%) responses were health disinformation

Examples of how AI systems can be used to create disinformation:

My take: This study shows how easy it is to get AI systems to provide misleading information in a convincing fashion. It might be interesting to include one of these systems to provide answers for the board game Balderdash.

Related blog posts:

AI Skirmish in Prior Authorizations

Teddy Rosenbluth NYT 7/10/24: In Constant Battle With Insurers, Doctors Reach for a Cudgel: A.I.

An excerpt:

For a growing number of doctors, A.I. chatbots — which can draft letters to insurers in seconds — are opening up a new front in the battle to approve costly claims, accomplishing in minutes what years of advocacy and attempts at health care reform have not….

Doctors are turning to the technology even as some of the country’s largest insurance companies face class-action lawsuits alleging that they used their own technology to swiftly deny large batches of claims and cut off seriously ill patients from rehabilitation treatment.

Some experts fear that the prior-authorization process will soon devolve into an A.I. “arms race,” in which bots battle bots over insurance coverage. Among doctors, there are few things as universally hated…

Doctors and their staff spend an average of 12 hours a week submitting prior-authorization requests, a process widely considered burdensome and detrimental to patient health among physicians surveyed by the American Medical Association.

With the help of ChatGPT, Dr. Tward now types in a couple of sentences, describing the purpose of the letter and the types of scientific studies he wants referenced, and a draft is produced in seconds.

Then, he can tell the chatbot to make it four times longer. “If you’re going to put all kinds of barriers up for my patients, then when I fire back, I’m going to make it very time consuming,” he said…

Epicone of the largest electronic health record companies in the country, has rolled out a prior-authorization tool that uses A.I. to a small group of physicians, said Derek De Young, a developer working on the product.

Several major health systems are piloting Doximity GPT, created to help with a number of administrative tasks including prior authorizations, a company spokeswoman said…

As doctors use A.I. to get faster at writing prior-authorization letters, Dr. Wachter said he had “tremendous confidence” that the insurance companies would use A.I. to get better at denying them.

Related blog posts:

Firefly Bike Trail (Athens, GA)