How AI Is Polluting the Scientific Landscape

Thousands of peer-reviewed papers now contain AI-fabricated citations. Here's what the data says, why it qualifies as misconduct, and what researchers stand to lose.

Radomir Grcic

4/6/20264 min read

AI hallucination visualized as a distorted brain illustration

In April last year, Springer Nature committed a significant oversight by publishing the $169 machine learning text-book “Mastering Machine Learning: From Basics to Advanced’’ with citations and authors that simply didn't exist, according to Retractionwatch.

This case opened the floor to debate the use of AI in scientific research as well as its role in polluting the research ecosystem that in an era of fast-disseminated information is struggling to keep clean.

When Citations Lie: The Hallucination Problem

Large language models like ChatGPT don't search literature databases the way researchers do. They generate text based on patterns in training data. Ask one to cite its sources, and it will produce something that looks like a citation, complete with author names, journal titles, volume numbers, and DOIs. The only thing missing is any guarantee it's real.

This is what researchers call an AI hallucination: when a model generates content that is plausible-sounding but factually fabricated. In the context of citations, hallucinations are defined as references that appear to paraphrase or combine metadata from one or more real sources such as titles, authors or publication venues, into something that never actually existed.

The highly recommended article in Nature involves the story of the computer scientist Guillaume Cabanac who experienced this firsthand. He received a Google Scholar alert notifying him that one of his papers had been cited in the International Dental Journal, a field his research had never crossed paths with before. His paper existed only as a preprint and was never formally published, but the citation contained fabricated metadata of a fully published paper.

The mechanics behind this aren't mysterious. Hallucinations occur because LLMs are pattern predictors, not knowledge retrievers. When pressed to produce citations, they invent titles and DOIs that fit the pattern of what a real citation looks like. Contributing factors include overfitting during training, biases in training data, and the sheer complexity of these models, all of which make confident-sounding fabrication entirely possible, and often undetectable at a glance.

The Concerning Numbers

This isn't a fringe problem anymore and the data from 2025 alone makes that clear.

The above cited study reported in Nature found that 2.6% of papers published in 2025 contained at least one suspected AI-hallucinated citation. That's nearly nine times the rate from the previous year, when the figure sat at 0.3%. A separate February study examined four computer science conferences and found that between 2% and 6% of papers contained references with altered titles or citations that couldn't be verified through any database or journal archive.

The scale becomes harder to ignore when you look at conference submissions. An investigation into papers submitted to the 2026 International Conference on Learning Representations (ICLR), one of the most prestigious venues in machine learning, found that 20% of a sampled set of 300 papers contained at least one AI hallucination. A GPTZero hallucination scan identified 50 peer-reviewed submissions that included at least one fabricated citation that peer reviewers had not flagged.

Meanwhile, a paper published in Nature Human Behaviour analyzed more than one million preprints and published articles, finding a significant and rapidly increasing presence of LLM-modified text across disciplines, with double-digit adoption rates in computer science specifically.

Fake Citations Are a Form of Scientific Misconduct

Citations aren't just bibliographic housekeeping. They are the verifiable trace links in a chain of scientific progress, the mechanism through which claims are grounded in evidence and results are made reproducible.

When those links are fabricated, the chain breaks.

Generating invented references or misrepresenting the literature isn't a minor formatting error. As IEEE and ACM both recognize, citation fabrication is indistinguishable from falsification, one of the established categories of scientific misconduct. The fact that an algorithm did the fabricating, rather than a human researcher, doesn't change the outcome. A fake reference is a fake reference, regardless of its origin.

When hallucinated citations make it through peer review, and they clearly do, they enter the permanent record. Other researchers may cite them. Conclusions built on invented evidence compound. The knowledge infrastructure quietly corrupts itself.

What AI Editing Costs You Beyond the Citations

The citation problem is the most visible symptom, but it's not the only one. At Quillcademia, we've been watching how AI editing affects the actual writing, not just the references.

In our recent blog post, we emphasized the findings cited in PLoS ONE that large language models make up to three times more corrections than human editors, out of which 61% were rated as improvements, which sounds impressive until you consider what happens to the remaining 39%.

LLMs tend to replace a far larger fraction of the original text, substituting the model's preferred vocabulary for the author's own. What gets lost in that process is not just stylistic, but often semantic. Intended meaning is quietly overwritten, and the author may not notice until a reviewer flags it.

They paraphrase, reorder, and rephrase based on patterns in their training data. When asked to "improve" or "academicize" a passage, a model may restructure phrasing in ways that closely mirror existing published work, without flagging the similarity, and without any intent on the author's part, which may lead to unintended plagiarism.

That's why we think that a seasoned editor brings years of field-specific knowledge and an intuitive grasp of what a sentence is trying to do. They substitute selectively, preserving the author's vocabulary and leaving the overall structure of the argument intact.

The Case for Keeping Humans in the Loop

None of this means AI has no place in research workflows. But it does mean that treating it as a citation generator, a ghostwriter, or a substitute for genuine editorial judgment carries real costs to credibility, accuracy, and integrity of the scientific record.

The knowledge landscape needs to stay clean. Right now, the tools we've handed that responsibility to are the same ones manufacturing the mess.