Boston Study Finds AI Chatbots Often Miss Diagnoses

Boston Docs Put AI Chatbots To The Test, And The Results Aren’t Pretty

Boston researchers just put some of the world’s most popular AI chatbots through their medical paces, and the verdict is not exactly confidence boosting for anyone using them as a stand-in for a doctor.

In a new study published in JAMA Network Open, researchers tested 21 off-the-shelf large language models on 29 stepwise clinical vignettes, generating more than 16,000 responses. The models often landed on the correct final diagnosis once they were given all the details. Earlier in the process, though, they struggled badly. Failure rates for generating a complete differential diagnosis topped 80% in many cases, a gap the authors say hides serious problems in the models’ reasoning.

Those concerns line up with what patient safety experts are already worried about. Misuse of AI chatbots is listed as the top health technology hazard for 2026 in a report from ECRI, which warns that chatbots can suggest unsafe treatments, push unnecessary tests, or amplify existing biases in medical data. The group urges health systems to build real AI governance, train clinicians on the tools’ limits, and audit performance on a regular basis.

On the front lines, some Boston doctors are even blunter. Dr. Marc Succi, the study’s corresponding author and executive director of the MESH Incubator at Mass General Brigham, told NBC Boston, “I don't think there's any chatbot publicly available, or medically specific, that is remotely as good as a physician.”

Plenty of people are turning to the bots anyway. A recent poll from Gallup found that roughly one in four Americans say they have already used an AI tool for health information or advice, a reminder that the technology is racing ahead in everyday use even as its medical performance remains shaky.

How to use chatbots safely

Local clinicians say there is still a role for chatbots, as long as people treat them like an extra research tool instead of a digital doctor. They can help you gather questions, look up background information, and organize your thoughts before an appointment. What they should not do is tell you whether to skip that appointment.

ECRI recommends that patients always verify chatbot guidance with a clinician. The group also urges hospitals and clinics to put formal governance and auditing in place so they can catch errors and biases before patients are harmed.

If you do use a chatbot, the study’s implications are clear. Save the conversation, jot down anything you think might be missing, and bring that record to your provider. What you should not do is make urgent or high-stakes medical decisions based only on what the bot tells you.

The authors conclude that, despite version-by-version improvements, off-the-shelf chatbots “are not ready for unsupervised clinical-grade deployment” and call for stronger validation and clinician oversight before wider use in patient-facing settings, according to a paper in JAMA Network Open. For now, Boston clinicians and patient safety groups say the safest move is to treat chatbots like a reasonably smart research assistant, not a licensed clinician, and to run any worrying advice past a real doctor.