A Harvard study just found AI can now out-diagnose physicians in the ER: ‘We’re already at the ceiling’

Wait 5 sec.

AI has entered your office, your kid’s classroom, and the courtroom. Now, it’s coming for the hospital.A recent study from a team led by physicians and computer scientists at Harvard Medical School and Beth Israel Deaconess Medical Center compared emergency room diagnoses from OpenAI’s o1-preview against those offered by two internal medicine attending physicians. The competing sets of diagnoses were then assessed by two other attending physicians, who didn’t know which results were from humans or AI. The results favored AI.“We tested the AI model against virtually every benchmark, and it eclipsed both prior models and our physician baselines,” Arjun Manrai, a senior co-author of the study and an assistant professor of biomedical informatics at Harvard’s Blavatnik Institute, said in a statement.AI has already started to reshape the medical field. Google DeepMind’s Alphafold is advancing biological research. Some emergency rooms have deployed generative AI to take notes and create medical records. And in Utah, an AI system is even prescribing medicine to patients without a physician in the loop (although physicians have noted this process could put patients at risk). The Harvard study is the latest evidence that AI models are increasingly capable of performing tasks crucial to the medical profession, producing results that even shocked the study’s own researchers.“I thought it was going to be a fun experiment but that it wouldn’t work that well. That was not at all what happened,” Adam Rodman, a senior author of the study and a Beth Israel doctor, said in a statement.How AI is reshaping medicineThis study’s findings were particularly jarring because unlike past AI medical studies, the researchers didn’t clean up the data. Each case was presented exactly as it appeared in an electronic health record.Peter Brodeur, a study co-author and a Harvard clinical fellow in medicine at Beth Israel Deaconness, said in a statement that AI models are increasingly capable. “We used to evaluate models with multiple-choice tests; now they are consistently scoring close to 100%, and we can’t track progress anymore because we’re already at the ceiling,” he said.To be sure, the researchers noted the results don’t mean AI is ready to replace physicians. Brodeur said that while AI is good at diagnosing, it also tends to suggest unnecessary testing that could actually do more harm than good.But AI is impacting physician decision-making. In a separate study published last December, a team of researchers found that 67% of physicians who initially recommended against treatment for a patient changed their decision after AI suggested the opposite.That’s despite the fact that there’s no formal framework for accountability when it comes to AI diagnoses and that the technology falls short of gaining patient trust, Rodman told The Guardian.Patients still “want humans to guide them through life or death decisions [and] to guide them through challenging treatment decisions,” he said.This story was originally featured on Fortune.com