Millions of people are turning to artificial intelligence (AI) chatbots for advice on everything from cooking to tax returns. Increasingly, they are also asking chatbots about their health. But as the UK’s chief medical officer recently warned, that may not be wise when it comes to medical decisions. In a recent study, colleagues and I tested how well large language model (LLM) chatbots help the public deal with common health problems. The results were striking. The chatbots we tested were not ready to act as doctors. A common response to studies like this is that AI moves faster than academic publishing. By the time a paper appears, the models tested may already have been updated. But studies using newer versions of these systems for patient triage suggest the same problems remain. We gave participants brief descriptions of common medical situations. They were randomly assigned either to use one of three widely available chatbots or to rely on whatever sources they would normally use at home. After interacting with the chatbot, we asked two questions: what condition might explain the symptoms? And where should they seek help?People who used chatbots were less likely to identify the correct condition than those who didn’t. They were also no better at determining the right place to seek care than the control group. In other words, interacting with a chatbot did not help people make better health decisions. Strong knowledge, weak outcomesThis does not mean the models lack medical knowledge because LLMs can pass medical licensing exams with ease. When we removed the human element and gave the same scenarios directly to the chatbots, their performance improved dramatically. Without human involvement, the models identified relevant conditions in the vast majority of cases and often suggested appropriate levels of care.So why did the results deteriorate when people actually used the systems? When we looked at the conversations, the problems emerged. Chatbots frequently mentioned the relevant diagnosis somewhere in the conversation, yet participants did not always notice or remember it when summarising their final answer. In other cases, users provided incomplete information or the chatbot misinterpreted key details. The issue was not simply a failure of medical knowledge – it was a failure of communication between human and machine. The study shows that policymakers need information about real-world performance of technology before introducing it into high-stakes settings such as frontline healthcare. Our findings highlight an important limitation of many current evaluations of AI in medicine. Language models often perform extremely well on structured exam questions or simulated “model-to-model” interactions. But real-world use is much messier. Patients describe symptoms in vague or incomplete way and can misunderstand explanations. They ask questions in unpredictable sequences. A system that performs impressively on benchmarks may behave very differently once real people begin interacting with it. AI may be better used as a medical secretary. ST_Travel/Shutterstock It also underscores a broader point about clinical care. As a GP, my job involves far more than recalling facts. Medicine is often described as an art rather than a science. A consultation isn’t simply about identifying the correct diagnosis. It involves interpreting a patient’s story, exploring uncertainty and negotiating decisions. Medical educators have long recognised this complexity. For decades, future doctors were taught using the Calgary–Cambridge model. This meant building a rapport with the patient, gathering information through careful questioning, understanding the patient’s concerns and expectations, explaining findings clearly and agreeing a shared plan for management. All these processes rely on human connection, tailored communication, clarification, gentle probing, judgement shaped by context and trust. These qualities cannot easily be reduced to pattern recognition.A different role for AIYet the lesson from our study is not that AI has no place in healthcare. Far from it. The key is understanding what these systems are currently good at and where their limitations lie.One useful way to think about today’s chatbots is that they function more like secretaries than physicians. They are remarkably effective at organising information, summarising text and structuring complex documents. These are the kinds of tasks where language models are already proving useful within healthcare systems, for example in drafting clinical notes, summarising patient records or generating referral letters. The promise of AI in medicine remains real, but its role is likely to be more supportive than revolutionary in the near term. Chatbots should not be expected to act as the front door to healthcare. They are not ready to diagnose conditions or direct patients to the right level of care. Artificial intelligence may be able to pass medical exams. But just as passing a theory test doesn’t make you a competent driver, practising medicine involves far more than answering questions correctly. It requires judgement, empathy and the ability to navigate the complexity that sits behind every clinical encounter. For now, at least, that requires people rather than bots.Rebecca Payne works on the Health and Care Research Wales funded REMEDY project and also recieves funding from a University of Oxford Clarendon-Reuben Scholarship. She is a Fellow of the Royal College of General Practitioners and a Senior Fellow of the Faculty of Medical Leadership and Management.