Enabling a new model for healthcare with AI co-clinician
- DeepMind研发AI co-clinician,旨在辅助临床决策,增强医疗服务。
- 展示多款AI模型,包括Gemini用于学习、构建和规划,Nano Banana专注于图像编辑。
- 强调AI在生成医疗影像、音频内容上的应用潜力,探索互动世界与实体AI的未来。
AI co-clinician: researching the path toward AI-augmented care — Google DeepMind
Explore our next generation AI systems
Gemini

Specialized models

World models & embodied AI

Open models

Our latest AI breakthroughs and updates from the lab
Breakthroughs

Learn more
EvalsPublicationsResponsibility
Unlocking a new era of discovery with AI
Life sciences

Climate and sustainability

Our mission is to build AI responsibly to benefit humanity
Responsibility Ensuring AI safety through proactive security, even against evolving threatsNews Discover our latest AI breakthroughs, projects, and updatesCareers We’re looking for people who want to make a real, positive impact on the world
Education We work to make AI more accessible to the next generationOur National Partnerships for AI Working with governments worldwide to benefit people through frontier AIThe Podcast Join Professor Hannah Fry as she uncovers the extraordinary way AI is transforming our world
Models
Explore our next generation AI systems
Gemini

Specialized models

World models & embodied AI

Open models

Research
Our latest AI breakthroughs and updates from the lab
Breakthroughs

Learn more
EvalsPublicationsResponsibility
Science
Unlocking a new era of discovery with AI
Life sciences

Climate and sustainability

About
Our mission is to build AI responsibly to benefit humanity
Responsibility Ensuring AI safety through proactive security, even against evolving threatsNews Discover our latest AI breakthroughs, projects, and updatesCareers We’re looking for people who want to make a real, positive impact on the worldEducation We work to make AI more accessible to the next generationOur National Partnerships for AI Working with governments worldwide to benefit people through frontier AIThe Podcast Join Professor Hannah Fry as she uncovers the extraordinary way AI is transforming our world
Google AI Learn about all our AIGoogle DeepMind Explore the frontier of AIGoogle Labs Try our AI experimentsGoogle Research Explore our research
Products and apps
Gemini app Chat with GeminiGoogle AI Studio Build with our next-gen AI modelsGoogle Antigravity Our agentic development platform
April 30, 2026 Science
Enabling a new model for healthcare with AI co-clinician
Alan Karthikesalingam, Vivek Natarajan and Pushmeet Kohli
- [x]
Share
[](https://twitter.com/intent/tweet?url=https://deepmind.google/blog/ai-co-clinician/&text=Enabling%20a%20new%20model%20for%20healthcare%20with%20AI%20co-clinician)[](https://www.facebook.com/sharer/sharer.php?u=https://deepmind.google/blog/ai-co-clinician/)[](https://www.linkedin.com/sharing/share-offsite/?url=https://deepmind.google/blog/ai-co-clinician/)[](mailto:?subject=Enabling%20a%20new%20model%20for%20healthcare%20with%20AI%20co-clinician&body=https://deepmind.google/blog/ai-co-clinician/)Copied
Health systems worldwide are striving for better outcomes, lower costs, and an improved experience for both patients and clinicians. However, progress is constrained by a global shortage of clinical experts, with the World Health Organization predicting a shortfall of more than 10 million health workers by 2030.
While AI is often seen as the key to bridging this gap, it has not yet been able to fully meet the needs of clinicians and patients. That's why, today, we are announcing our AI co-clinician research initiative, to explore how AI could better amplify doctors’ expertise and deliver higher quality care to patients.
At Google DeepMind, our journey in medical AI has evolved from mastering examination-style tests of medical knowledge with MedPaLM, to matching physician performance in text-based simulated medical consultations with AMIE, including in real-world feasibility trial settings. We also have a long history of studying how clinicians and AI systems might worktogether.
We hypothesize that the next evolution of healthcare delivery will entail “triadic care” where AI agents can help patients in their care journeys under the clinical authority of their physician. Medicine has always been a team sport, and AI agents can bring more teammates onto the field: extending clinicians' reach while ensuring they retain judgment and control.
This serves as the foundation of our AI co-clinician research initiative: AI designed to function as a collaborative member of the care team that interacts with patients under expert clinical supervision. We designed and evaluated AI co-clinician in both clinician and patient-facing settings. Addressing both perspectives is key for AI to enhance the quality, cost, availability and experience of care delivery.
!Image 40: Advancements in research into medical AI so that they might be more trustworthy and helpful for clinicians in assisting patients.!Image 41: Advancements in research into medical AI so that they might be more trustworthy and helpful for clinicians in assisting patients.
Advancements in research into medical AI so that they might be more trustworthy and helpful for clinicians in assisting patients.
Augmenting clinicians with AI co-clinician
For a physician, a tool is useful only if it is trustworthy and factually grounded. We therefore researched how well AI co-clinician might support clinicians by surfacing high-quality evidence.
In collaboration with academic physicians, we adapted the "NOHARM" framework to test our AI for "errors of commission" (incorrect information) and "errors of omission" (failure to surface critical information).
In head-to-head blind evaluations, physicians consistently preferred AI co-clinician’s responses to leading evidence synthesis tools. In objective analysis of 98 realistic primary care queries, our system recorded zero critical errors in 97 cases, improving over two AI systems widely used by physicians.
!Image 42: The study used a blind comparison of 98 realistic primary care queries, which were curated from a diverse range of sources and subsequently refined by a panel of attending physicians. This multi-step iterative process involved comprehensive background research and the development of query-specific answer metrics to enable a rigorous professional assessment of clinical accuracy and compliance with best practice guidance. By leveraging this expert-led refinement phase, the methodology allowed for a precise characterization of consensus scenario-specific errors of omission and commission, ensuring that the evaluation reflected the complexities of real-world clinical decision-making.!Image 43: The study used a blind comparison of 98 realistic primary care queries, which were curated from a diverse range of sources and subsequently refined by a panel of attending physicians. This multi-step iterative process involved comprehensive background research and the development of query-specific answer metrics to enable a rigorous professional assessment of clinical accuracy and compliance with best practice guidance. By leveraging this expert-led refinement phase, the methodology allowed for a precise characterization of consensus scenario-specific errors of omission and commission, ensuring that the evaluation reflected the complexities of real-world clinical decision-making.
The study used a blind comparison of 98 realistic primary care queries, which were curated from a diverse range of sources and subsequently refined by a panel of attending physicians. This multi-step iterative process involved comprehensive background research and the development of query-specific answer metrics to enable a rigorous professional assessment of clinical accuracy and compliance with best practice guidance. By leveraging this expert-led refinement phase, the methodology allowed for a precise characterization of consensus scenario-specific errors of omission and commission, ensuring that the evaluation reflected the complexities of real-world clinical decision-making.
Beyond reliable synthesis of clinical evidence, AI systems should answer queries about medications and therapeutic interventions with the precision that doctors demand. This is a difficult task for AI yet remains underexplored. To address this gap, we evaluated AI co-clinician on the OpenFDA set of RxQA questions, a challenging benchmark designed to assess complex medication knowledge and reasoning. We saw significant progress in navigating these tests, surpassing other frontier AI systems especially when questions were posed in the open-ended way they’re asked in real care. The findings underscore the potential for advanced AI to provide helpful assistance as clinicians navigate the increasingly data-intensive requirements of care planning and management.
!Image 44: RxQA was originally posed as a multiple-choice question (MCQ) test in which even primary care physicians scored modestly. While our results show significant improvements for AI systems’ MCQ performance in the openly available (OpenFDA) set of RxQA, clinicians’ needs in the real-world present as open-ended questions rather than a need to identify the correct answer from pre-determined options. On this more realistic clinical task of open-ended question-answering about medications, AI co-clinician outperforms available frontier models. Taken together, these results show that AI can mirror human physician proficiency in such aspects of clinical reasoning with opportunities for further improvement.!Image 45: RxQA was originally posed as a multiple-choice question (MCQ) test in which even primary care physicians scored modestly. While our results show significant improvements for AI systems’ MCQ performance in the openly available (OpenFDA) set of RxQA, clinicians’ needs in the real-world present as open-ended questions rather than a need to identify the correct answer from pre-determined options. On this more realistic clinical task of open-ended question-answering about medications, AI co-clinician outperforms available frontier models. Taken together, these results show that AI can mirror human physician proficiency in such aspects of clinical reasoning with opportunities for further improvement.
RxQA was originally posed as a multiple-choice question (MCQ) test in which even primary care physicians scored modestly. While our results show significant improvements for AI systems’ MCQ performance in the openly available (OpenFDA) set of RxQA, clinicians’ needs in the real-world present as open-ended questions rather than a need to identify the correct answer from pre-determined options. On this more realistic clinical task of open-ended question-answering about medications, AI co-clinician outperforms available frontier models. Taken together, these results show that AI can mirror human physician proficiency in such aspects of clinical reasoning with opportunities for further improvement.
Researching AI co-clinician’s real time multimodal capabilities in telemedical settings
Beyond assistive clinician-facing settings, we are also investigating how AI co-clinician performs within patient-facing research contexts. Expert clinical assessment traditionally includes subtle visual and auditory cues, such as observing a patient’s gait, the nuances of respiratory patterns, or the appearance of skin changes. While prior studies (including our work with Beth Israel Deaconess Medical Center) demonstrated value in AI text-chats before a doctor’s appointment, restricting interactions to text fundamentally constrains the clinical value of AI. Medicine isn’t just text; it requires eyes, ears and a voice.
This is why we are exploring the potential for real-time multimodal AI as an assistive component of the care team. Building on the capabilities of Gemini and Project Astra, we tested the capabilities of AI co-clinician to use live audio and video to engage with patients, simulating telemedical calls where capable AI could one day support better diagnosis and management under expert supervision. Further details regarding our methodology and results are available in our technical report: “Towards Conversational Medical AI with Eyes, Ears and a Voice”
Working with academic physicians at Harvard and Stanford, we designed a randomized simulation study with 20 synthetic clinical scenarios and 10 physician "patient-actors". The agent demonstrated new capabilities beyond text-only systems, such as guiding patients through complex physical examinations in real time. For example, it successfully corrected a patient's inhaler technique and guided shoulder maneuvers to identify a rotator cuff injury.
While there is frequent discussion regarding AI’s potential to match or exceed human clinical performance, these high-fidelity simulations more rigorously evaluate that premise. We assessed over 140 aspects of consultation skill and found that expert physicians performed better than the AI system overall, particularly in identifying "red flags" and guiding critical physical examinations. This finding suggests these systems are currently best used as supportive tools for practitioners rather than replacements for clinical judgment. At the same time, our work highlights the significant progress in AI’s capabilities: AI co-clinician performed at a level comparable to or exceeding primary care physicians (PCPs) in 68 of the 140 assessed areas. The results underscore extensive promise and flag specific areas where further research can most impactfully advance medical AI.
!Image 46: Results from a randomized, interface-blinded, crossover simulation study involving 120 hypothetical telemedical encounters performed by real primary care physicians, the AI co-clinician or GPT-realtime. For the evaluation a pool of internal medicine residents served as patient actors, enacting 20 standardized outpatient scenarios. These scenarios, spanning a range of clinical conditions, were specifically designed to require proactive auditory and visual reasoning. Scenario-tailored criteria assessed seven domains of consultation quality, with each item using anchored 0–2 scoring to distinguish omissions, partial completion, and fully appropriate performance. Error bars correspond to 95% confidence intervals.!Image 47: Results from a randomized, interface-blinded, crossover simulation study involving 120 hypothetical telemedical encounters performed by real primary care physicians, the AI co-clinician or GPT-realtime. For the evaluation a pool of internal medicine residents served as patient actors, enacting 20 standardized outpatient scenarios. These scenarios, spanning a range of clinical conditions, were specifically designed to require proactive auditory and visual reasoning. Scenario-tailored criteria assessed seven domains of consultation quality, with each item using anchored 0–2 scoring to distinguish omissions, partial completion, and fully appropriate performance. Error bars correspond to 95% confidence intervals.
Results from a randomized, interface-blinded, crossover simulation study involving 120 hypothetical telemedical encounters performed by real primary care physicians, the AI co-clinician or GPT-realtime. For the evaluation a pool of internal medicine residents served as patient actors, enacting 20 standardized outpatient scenarios. These scenarios, spanning a range of clinical conditions, were specifically designed to require proactive auditory and visual reasoning. Scenario-tailored criteria assessed seven domains of consultation quality, with each item using anchored 0–2 scoring to distinguish omissions, partial completion, and fully appropriate performance. Error bars correspond to 95% confidence intervals.
Below you can see the research team role-playing as hypothetical patients in this telemedical setting with the AI co-clinician, highlighting the system’s potential capabilities and limitations.
Slide 1 of 3
These videos are for research purposes only and do not involve real patients. They are being shared to demonstrate the capabilities and limitations of the technology today. Our initial research collaborations do not involve the depicted capabilities, which are not intended for use in the diagnosis, cure, mitigation, treatment, or prevention of disease, or to provide medical advice.
These videos are for research purposes only and do not involve real patients. They are being shared to demonstrate the capabilities and limitations of the technology today. Our initial research collaborations do not involve the depicted capabilities, which are not intended for use in the diagnosis, cure, mitigation, treatment, or prevention of disease, or to provide medical advice.
These videos are for research purposes only and do not involve real patients. They are being shared to demonstrate the capabilities and limitations of the technology today. Our initial research collaborations do not involve the depicted capabilities, which are not intended for use in the diagnosis, cure, mitigation, treatment, or prevention of disease, or to provide medical advice.
Engineering trust with safeguards for clinical-grade AI
The transition and deployment of AI into clinical environments requires uncompromising architectural and operational safeguards. In our research on simulations of patient-facing telemedical conversations, AI co-clinician uses a dual-agent architecture: a "Planner" module continuously monitors the conversation, verifying that the "Talker" agent stays within safe clinical boundaries.
Similarly, to meet doctors’ needs AI co-clinician prioritizes clinical-grade evidence, performing verification and citation checking for retrieval. The evaluations we report above were constructed by physicians to mirror a range of their real-world evidence needs, formulating questions from hypothetical scenarios for rigorously evaluating AI’s capabilities.
Research collaborations for rigorous real-world evaluation of AI co-clinician
To further develop and assess AI co-clinician, we are currently advancing a phased approach with academic and research collaborators across globally diverse healthcare settings including in the US, India, Australia, New Zealand, Singapore and UAE.
As we progress through these evaluation phases, we will further our research in more geos including mission-aligned healthcare organizations and academic medical centers. Our goal is to ensure that medical AI is developed and deployed responsibly in line with applicable standards, supporting better health worldwide.
_Note: Our research collaborations are not, at this stage, intended for use in the diagnosis, cure, mitigation, treatment, or prevention of disease, or to provide medical advice._
Acknowledgements
We are grateful to our research partners at Harvard Medical School and Stanford Medicine and the many medical centers and care organizations engaging in further trusted tester evaluations with our team. This project involved collaborations with many teams at Google DeepMind, Google Research, Google Cloud and Google for Health and we thank our team mates for insightful discussions and contributions.
In particular, AI co-clinician would not have been possible without the core research and engineering efforts of Aniruddh Raghu, Arthur Chen, Charlie Taylor, CJ Park, David Stutz, Devora Berlowitz, Doug Fritz, Dylan Slack, Eliseo Papa, Jack Chen, JD Velasquez, Jing Rong Lim, Katya Tregubova, Kelvin Guu, Meet Shah, Richard Green, Ryutaro Tanno, Sukhdeep Singh, Victoria Johnston, Adam Rodman.
We thank our many collaborators for their invaluable contributions, including Ali Eslami, Aliya Rysbeck, Andy Song, Anil Palepu, Anna Cupani, Bakul Patel, Bibo Xu, Brett Hatfield, David Wu, Ed Chi, Emma Cooney, Erica Oppenheimer, Erwan Rolland, Euan A. Ashley, Francesca Pietra, Rebeca Santamaria-Fernadez, Gordon Turner, Gregory Wayne, Hannah Gladman, Irene Teinemaa, Jack O'Sullivan, Jacob Koshy, Jan Freyberg, Jason Gusdorf, Joelle Wilson, Katherine Tong, Juraj Gottweis, Michael Howell, Mili Sanwalka, Pavel Dubov, Pete Clardy, Peter Brodeur, Rachelle Sico, SiWai Man, Sumanth Dahathri, Taylan Cemgil, Tim Strother, Uchechi Okereke, Valentin Lievin, Vishnu Ravi, Yana Lunts, Yun Liu, Simon Staffell, Rachel Teo, Adriana Fernandez Lara, Armin Senoner, Danielle Breen, Paula Tesch, Leen Verburgh, Dimple Vijaykumar, Juanita Bawagan, Muinat Abdul, Mariana Montes and Rob Ashley. Feature videos were produced by Christopher Godfree, Matt Mager, Emma Moxhay and Simon Waldron.
Thanks to James Manyika and Demis Hassabis for their insightful guidance and support throughout the research process.
Related posts
[](http://deepmind.google/blog/codoc-developing-reliable-ai-tools-for-healthcare/)
Developing reliable AI tools for healthcare
July 2023 Research
Follow us
[](https://x.com/googledeepmind)
[](https://www.instagram.com/googledeepmind)
[](https://www.youtube.com/@googledeepmind)
[](https://www.linkedin.com/company/googledeepmind/)
[](https://github.com/google-deepmind)
Sign up for updates on our latest innovations
I accept Google's Terms and Conditions and acknowledge that my information will be used in accordance with Google's Privacy Policy.
Sign up
Build AI responsibly to benefit humanity
Models
GeminiNano BananaGemini AudioGenieLyriaVeo
Research
EvalsBreakthroughsPublicationsResponsibility
Science
AlphaFoldAlphaGenomeWeatherNextAlphaEarth
Products
Gemini appGoogle AI StudioGoogle Antigravity
Learn more
AboutNewsCareersNational Partnerships for AIThe Podcast
[](https://www.google.com/?utm_source=ai.google&utm_medium=referral "Google")
Cookies management controls
问问这篇内容
回答仅基于本篇材料Skill 包
领域模板,一键产出结构化笔记论文精读包
把一篇论文 / 技术博客精读成结构化笔记:问题、方法、实验、批判、延伸阅读。
- · TL;DR(1 段)
- · 研究问题与动机
- · 方法概览
投融资雷达包
把一条融资 / 创投新闻整理成投资人视角的雷达卡:交易要点、判断、竞争格局、风险、尽调清单。
- · 交易要点(公司 / 轮次 / 金额 / 投资人 / 估值,材料未明示则写 “未披露”)
- · 投资 thesis(这家公司为什么值得关注)
- · 竞争格局与替代方案