MD Modi et al. Annals of Internal Medicine; 2025. https://doi.org/10.7326/ANNALS-24-0393. Abstract: Assessing the System-Instruction Vulnerabilities of Large Language Models to Malicious Conversion Into Health Disinformation Chatbots
Methods: This study assessed the effectiveness of safeguards in foundational LLMs against malicious instruction into health disinformation chatbots. Five foundational LLMs—OpenAI’s GPT-4o, Google’s Gemini 1.5 Pro, Anthropic’s Claude 3.5 Sonnet, Meta’s Llama 3.2-90B Vision, and xAI’s Grok Beta—were evaluated via their application programming interfaces (APIs). Each API received system-level instructions to produce incorrect responses to health queries, delivered in a formal, authoritative, convincing, and scientific tone.
Key findings:
- Of the 100 health queries posed across the 5 customized LLM API chatbots, 88 (88%) responses were health disinformation
Examples of how AI systems can be used to create disinformation:


My take: This study shows how easy it is to get AI systems to provide misleading information in a convincing fashion. It might be interesting to include one of these systems to provide answers for the board game Balderdash.
Related blog posts:
- ChatGPT4 Outperforms GI Docs for Postcolonoscopy Surveillance Advice
- Medical Diagnostic Errors
- ChatGPT Passes the Bar, an MBA exam, and Earns Medical License?
- ChatGPT4 Outperforms GI Docs for Postcolonoscopy Surveillance Advice
- Chatbots Helping Doctors with Empathy
- AI Skirmish in Prior Authorizations