How good are ‘AI doctors’ — and will they take over medicine?

Wait 5 sec.

NEWS03 June 2026Nature talks to specialists about whether people might soon be diagnosed by artificial-intelligence systems.ByMariana LenharoMariana LenharoView author publicationsSearch author on: PubMed Google ScholarAt least one clinical trial is in the works that aims to test the ability of an ‘AI doctor’ to take a patient’s information and propose a diagnosis.Credit: andrei_r/Getty‘AI outperforms doctors in emergency room tasks’‘Google AI has better bedside manner than human doctors’Headlines such as these that tout the abilities of artificial-intelligence tools to surpass the skills of physicians are becoming more common.An AI revolution is brewing in medicine. What will it look like?But an advanced large language model (LLM) beating a physician at a single task doesn’t necessarily mean that AI is ready to take over medicine in the real world. Nature spoke to researchers studying the use of AI in health care to understand which ‘AI doctors’ have shown the most promise so far — and when such tools might take command of medical diagnoses. Some scientists point out that various AI systems are already handling simple medical tasks, such as taking notes and even renewing prescriptions, but they say that physicians can never fully be replaced by machines.“Medicine is messy and patients don’t always have textbook stories to tell,” says David Wu, a resident physician who studies AI at Harvard Medical School in Boston, Massachusetts. “I don’t think we’ve proven that these systems can handle that mess.”Up to the testStill, a few demonstrations have got researchers excited about the AI revolution brewing in medicine. One study, published in April in the journal Science1, concluded that an advanced LLM performed better than physicians when evaluating the conditions of people visiting the emergency department at a Boston hospital. When the AI model — called o1 and developed by OpenAI in San Francisco, California — reviewed the information recorded by hospital staff members during a visit, it got the diagnosis correct or almost correct in 67% of cases, compared with around 50–55% for the two human doctors who participated in the experiment.Google AI has better bedside manner than human doctors — and makes better diagnosesBecause the study used real-world data, it marks an evolution for AI tools, which have in the past been tested on simulated patient scenarios or neatly curated medical cases, say researchers who spoke to Nature. But it’s still a long way from emulating what goes on in a real emergency department, they say. For example, neither the AI model nor the doctors in the study had the opportunity to interact with the patients.Another study, posted on the preprint server arXiv in March2 ahead of peer review, has also created a buzz by investigating how AI systems do when conversing with patients to make a diagnosis. A team led by scientists at Google Research in Mountain View, California, monitored an AI system that they developed, named the Articulate Medical Intelligence Explorer (AMIE), as it used text messages to chat with real patients who had been scheduled for urgent-care appointments at a clinic in Boston. The interactions, during which AMIE collected patients’ histories and discussed possible diagnoses, occurred up to five days before their appointments with human physicians.AMIE then generated a list of possible diagnoses on the basis of those conversations. The correct diagnosis was among the chatbot’s top three suggestions in 75% of cases and it was the top suggestion in 56% of cases. The system’s performance was similar to that of the actual physicians who the patients eventually saw — although the treatment plans proposed by the human clinicians were more practical and cost-effective than were those proposed by AMIE.Ready for prime time?These two studies show how much medical AI has evolved in the past three years, says Robert Wachter, a physician at the University of California, San Francisco, who is the author of a book on how AI is transforming health care. He explains that during that period, LLMs have gone from succeeding at simple tasks, such as passing multiple-choice medical exams, to matching physicians’ diagnoses in complex cases when fed the necessary information. “That’s pretty exciting,” he says.Cheap AI chatbots transform medical diagnoses in places with limited caredoi: https://doi.org/10.1038/d41586-026-01691-6ReferencesBrodeur, P. G.et al. Science 392, 524–527 (2026).Article PubMed Google Scholar Brodeur, P. et al. Preprint at arXiv https://doi.org/10.48550/arXiv.2603.08448 (2026).Korom, R. et al. Preprint at arXiv https://doi.org/10.48550/arXiv.2507.16947 (2025).Wu, D. et al. Preprint at arXiv https://doi.org/10.48550/arXiv.2512.01241 (2025).Download references Google AI better than human doctors at diagnosing rashes from pictures Cheap AI chatbots transform medical diagnoses in places with limited care Google AI has better bedside manner than human doctors — and makes better diagnoses An AI revolution is brewing in medicine. What will it look like?SubjectsComputer scienceMedical researchMachine learningHealth careLatest on:Computer scienceMedical researchMachine learningJobs Faculty Positions at SUSTech Department of Biomedical EngineeringWe seek outstanding applicants for full-time tenure-track/tenured faculty positions. Positions are available for both junior and senior-level.Shenzhen, Guangdong, ChinaSouthern University of Science and Technology (Biomedical Engineering)Interim Associate or Senior Editor, Communications BiologyWe are looking for an Associate Editor with research experience in relevant scientific areas in biology to join Communications Biology.London or Pune – Hybrid working modelSpringer Nature Ltd16 Fully Funded PhD Fellowships in NeuroscienceNeuroscience Academy Denmark (NAD) offers 16 fully funded PhD fellowships to highly motivated candidates pursuing a career in neuroscience research!Denmark (DK)Neuroscience Academy Denmark