Hallucinated citations are polluting the scientific literature. What can be done?

Wait 5 sec.

Earlier this year, computer scientist Guillaume Cabanac received a notification from Google Scholar that one of his publications had been cited in a paper published in the International Dental Journal1. That was unexpected, because his research on spotting fabricated papers doesn’t typically intersect with dentistry. “I was very surprised to see that I couldn’t recognize my own reference,” says Cabanac, who is based at the University of Toulouse in France.The title in the citation resembled that of a preprint2 he had posted in 2021 and never published formally, but the journal was listed as Nature and the DOI — the unique identifier assigned by publishers and preprint repositories — did not lead to the original preprint. “I got very concerned,” adds Cabanac, who immediately suspected that the citation had been hallucinated by artificial intelligence.This is just one example of a rapidly growing problem. Surveys and related studies have shown that researchers are increasingly using large language models (LLMs) to help to conduct literature searches, write manuscripts and format bibliographies. And sometimes, these models generate non-existent academic references.Is AI leading to a reproducibility crisis in science?Over the past year, efforts have begun turning up such hallucinated citations in the literature. One analysis of nearly 18,000 papers accepted by three computer-science conferences found a sharp increase in references that cannot be traced to actual scholarly publications3. The results, reported in January, indicated that 2.6% of papers in 2025 had a least one potentially hallucinated citation — up from about 0.3% in 2024. Another analysis, released in February, estimated that 2–6% of papers in four other 2025 computer-science conferences included references with rephrased titles or citations of publications that the authors couldn’t verify by searching through databases and journal archives4.And although the scale of the problem remains uncertain, it’s clear that not only conferences are affected. An exclusive analysis conducted by Nature’s news team, in collaboration with Grounded AI, a company based in Stevenage, UK, suggests that at least tens of thousands of 2025 publications, including journal papers and books, as well as conference proceedings, probably contain invalid references generated by AI.Grounded AI is among the companies offering publishers tools for screening submissions for problematic references. Several publishers told Nature reporters that they have been exploring such tools or developing in-house versions.But some researchers are concerned that the problem will soon get out of hand. “We’re going to see a flood of fake references,” says Alison Johnston, a political scientist at Oregon State University in Corvallis.Another issue is deciding what to do about hallucinated citations that make it into the published literature. That’s a problem that academic publishers are wrestling with right now.Sources of errorCitation errors are not new to academic publishing. “Even before generative AI, we already had so many inaccuracies in citations,” says Mohammad Hosseini, who studies research ethics and integrity at Northwestern University Feinberg School of Medicine in Chicago, Illinois. Issues have tended to include misspelling of authors’ names or errors in the year of publication, the title of the journal or the DOI. Another issue has been discrepancies between the information in the cited work and the details given by the paper citing it5,6.“Now the problem is not just inaccuracy, it’s about fake citations. It’s about fabricated citations, which is a whole different problem,” says Hosseini.Publishers told Nature that they are seeing increases in the number of fabricated and inaccurate citations in submissions, and they are taking steps to tackle the issue.Johnston, co-lead editor of the Review of International Political Economy (RIPE), a journal published by the UK-based Taylor & Francis, says that she rejected 25% of some 100 submissions in January “because of fake references”. She uses the plagiarism-detection software iThenticate to flag unusual or partial matches between the references in submitted papers and published bibliographies. Then she manually checks the suspicious citations. “I’m doing things now to try and detect hallucinated references that I wasn’t doing prior to 2025,” she says.Take Nature’s AI research test: find out how your ethics compareFrontiers, based in Lausanne, Switzerland, has developed an in-house AI tool for flagging integrity issues at the point of submission, including references to irrelevant or retracted work and hallucinated citations. “Around 5% [of manuscripts] show potential reference-related issues flagged through our checks,” says Elena Vicario, Frontiers’ head of research integrity. But “not all flagged references ultimately turn out to be genuinely problematic”, she adds. That makes it challenging, Vicario says, to come up with a precise measure of the prevalence of any of these types of citation issue.Experiments using AI chatbots to generate papers have provided insights into how often LLMs produce citation errors and what types of error they tend to make. In one study, researchers prompted OpenAI’s GPT-4o LLM to generate six literature reviews on three mental-health disorders, and analysed the 176 references in those synthetic reviews7. Under these experimental conditions, they found that nearly 20% were fabricated references and could not be linked to actual research. And 45% of the remaining references, which corresponded to genuine publications, contained errors, often incorrect or invalid DOIs.In some cases, including in references in published articles, all of the component parts are made up, says Kathryn Weber-Boer, director of scientometrics at the London-based company Digital Science. (The firm is operated by the Holtzbrinck Publishing Group, which is the majority shareholder of Springer Nature, which publishes Nature. Nature’s news team is editorially independent of its publisher.) AI also hallucinates DOIs, both in references that are otherwise genuine as well as in fabricated ones, she adds.AI-generated references commonly combine fragments of genuine publications, say researchers who have studied the issue (see ‘How fakes can look real’). Joe Shockman, co-founder and chief executive of Grounded AI, calls such references ‘Frankenstein’ citations, likening their assembly to that of the fictional monster. “It looks real to a human being, but is not actually a reference to a real thing,” says Shockman, who is based in Ashland, Oregon.Source: Ref. 7Although some types of error seem to implicate AI, others are less clear-cut, say researchers. “In today’s landscape, we have to recognize that there are human errors and there are machine errors, and those can often overlap,” says Weber-Boer.Published problemsHow many hallucinated citations are showing up in published research remains difficult to discern. To get an estimate, Nature’s news team joined forces with Grounded AI, which has developed an AI tool called Veracity that checks citations against scholarly databases and across the web, flagging ones that are invalid, irrelevant or cite retracted work.Nature and Grounded AI collaborated to analyse more than 4,000 publications from last year, covering five leading publishers: Elsevier, Sage, Springer Nature, Taylor & Francis and Wiley. Grounded AI randomly sampled these papers from Europe PMC — a repository of open-access biomedical research articles — and the bibliometric database Crossref, to include equal number of publications per month from each of the five publishers. The sample included published papers as well as book chapters and conference proceedings, and it cut across all subject areas in these publishers’ portfolios.Grounded AI’s tool looks for an exact match to a reference or the closest match it can find. It then flags citations with major issues, such as mismatched titles or DOIs, missing authors and incorrect journals, as well as more-minor issues. Citations that pointed to papers that couldn’t be found even though they should be easy to find — because the journal in question is indexed by scholarly databases, for example — were marked as especially problematic.After running the publications through the tool, Grounded AI assigned a risk score to each of the published papers, on the basis of the number of references that had major issues and how likely those issues were to have been generated by AI. Grounded AI determined that likelihood using data gleaned from a separate analysis that used two AI models to generate 20,000 synthetic papers; this allowed the company to identify the most common types of citation error that AI makes.Nature manually checked the 100 most suspicious publications and confirmed that 65 contained at least one invalid reference, meaning that it pointed to a publication that did not seem to exist (see ‘Finding the fabrications’). But 22 of the 100 most-suspicious papers had references that did point to genuine publications.For the remaining 13 papers, it was unclear whether all their citations pointed to existing research or not. These 13 papers included references to articles that were said to be published in regional journals in languages other than English, and references that had mismatches in metadata that looked like plausible human errors, for example.The analysis, which looked at reference lists from Crossref and full text from Europe PMC publications, turned up no clear trend across publishers. Each of the selected publishers had more than five publications with references that manual checks couldn’t validate.As a rough estimate, if the rate of 65 publications with at least one invalid reference out of some 4,000 publications analysed holds across the academic literature, it would suggest that more than 110,000 of the 7 million or so scholarly publications from 2025 contain invalid references.Nick Morley, Grounded AI’s co-founder and chief product officer, says that the types of citation problem seen in 2025 are different from those found by his team before the proliferation of LLMs. This fact, he says, points to the use of AI as a leading culprit.The true number of hallucinated references is almost certainly higher, says Weber-Boer, because the analysis focused on big publishers, which have more resources for checking citations systematically than do smaller publishers. Fields such as computer science, which has seen a surge in the use of LLMs to produce manuscripts8, might be more affected than other fields. What’s more, the Grounded AI analysis turned up a few hundred more publications that had some risk of hallucinated citations, suggesting that extra manual checking would have brought more such citations to light.Spokespeople for all five publishers said that they check references as part of their screening and editing process, and they intend to investigate the publications flagged by the Nature analysis. A spokesperson for Taylor & Francis said that some of the publications flagged were already under investigation by its ethics and integrity team.How to spot suspicious papers: a sleuthing guide for scientistsWhen it comes to hallucinated references, “There have been cases where authors have been able to clearly document where issues have occurred in the process of producing a manuscript, for example using a translation tool, and demonstrate that the rest of the paper can be relied upon, in which case the paper will be corrected,” says Chris Graf, Springer Nature’s research-integrity director. But, more often, these references reflect broader problems with the content, he says.Shockman says that the number of potentially problematic citations flagged by Veracity is an order of magnitude greater when it is used in pilot programmes to screen submissions on behalf of publishers than when it analyses publications. This suggests that publishers are catching a large proportion of such citations before they can make it into the literature.Nature’s collaboration with Grounded AI also highlighted, as many experts have noted, that the detection of invalid citations with automated tools is not error-free. One of the challenges is that journals have various ways of formatting references, and AI tools might fail to recognize references because of how they are styled. These types of problem showed up among citations that manual checks determined to be genuine despite having been flagged by Grounded AI.Another issue, says Weber-Boer, is that large-scale bibliometric databases might not index references that can’t be verified, meaning their metadata might not match what appears on the publishers’ websites. Some references do not contain their corresponding DOI, which makes it hard for automated tools to identify the cited paper, adds Weber-Boer. “We’re starting to get a handle on the characteristics of this problem, which are a precursor to understanding the scale of it,” she says.The Grounded AI team members acknowledge that not all the references their tool flags will be true positives, but they say they are continuing to improve its performance. IOP Publishing, based in Bristol, UK, is now using Grounded AI’s tool to screen submissions for problematic citations across all of its proprietary journals, says Kim Eggleton, head of peer review and research integrity. “We know it’s a problem, we just don’t know how big the problem is,” she says.Fake-citation fallout