Gary Marcus(@GaryMarcus)2026年4月21日

Please don’t trust your chatbot for medical advice. 🙏 Remember how I used to say that large langua...

8.0Score

AI 深度提炼

大型语言模型常产生错误但表现自信
医学领域对LLM的误用可能导致严重后果
研究显示聊天机器人医疗信息准确率低

#AI#医疗#LLM

Remember how I used to say that large language models are “frequently wrong, never in doubt”, and how I warned three years ago on 60 Minutes that they were purveyors of “authoritative bullshit” that should not be trusted?" / X

Post

Conversation

Please don’t trust your chatbot for medical advice. !Image 1: 🙏 Remember how I used to say that large language models are “frequently wrong, never in doubt”, and how I warned three years ago on 60 Minutes that they were purveyors of “authoritative bullshit” that should not be trusted? That’s still true – and it very much applies in medicine. And that matters, a lot. Because a large fraction of the population has begun to turn to chatbots for medical advice. Two relevant new studies are reported today in the Washington Post, in a damning article. The first new study, published by BMJ (affiliated with the British Medical Association) in a peer reviewed journal, and entitled “Generative artificial intelligence-driven chatbots and medical misinformation: an accuracy, referencing and readability audit”, studied five popular chatbots (Gemini, DeepSeek, Meta AI, ChatGPT and Grok), about one year ago, prompting each with 10 questions about things ranging from cancer to vaccines and nutrition, in open-ended dialogues, and reporting that nearly half of the responses were highly problematic. Worse, “chatbot outputs were consistently expressed with confidence and certainty”. The responses were also filled with hallucinations and fabricated citations. All of this – the hallucinations, mistakes, and overconfidence – is entirely typical of LLMs, and entirely problematic in medicine. As the authors put it, in somewhat academic language, but entirely accurately, “continued deployment without public education and oversight risks amplifying misinformation.” The second new study, published in JAMA Network Open, affiliated with the American Medical Association, called “Large Language Model Performance and Clinical Reasoning Tasks” looked at 21 frontier models across 29 questions, and reported that “despite progress, current LLMs remain limited in early diagnostic reasoning and cannot yet be relied on for unsupervised patient-facing clinical decision-making.” And the Post article actually only reported part of the new scientific literature on LLMs and medicines. Two other new studies that they missed only add to the concerns* There will always be better models, but for now, and until proven otherwise, we should not take the apparent “confidence” of large language models — itself an illusion of how they are trained — to mean that we should trust large language with our lives. *Read full version including re other studies, all with links, at my newsletter Marcus on AI; you can subscribe for free.