When neuropsychologist Bernhard Sabel put his new fake-paper detector to work, he was “shocked” by what it found. After screening some 5000 papers, he estimates up to 34% of neuroscience papers published in 2020 were likely made up or plagiarized; in medicine, the figure was 24%. Both numbers, which he and colleagues report in a medRxiv preprint posted on 8 May, are well above levels they calculated for 2010—and far larger than the 2% baseline estimated in a 2022 publishers’ group report.
“It is just too hard to believe” at first, says Sabel of Otto von Guericke University Magdeburg and editor-in-chief of Restorative Neurology and Neuroscience. It’s as if “somebody tells you 30% of what you eat is toxic.”
His findings underscore what was widely suspected: Journals are awash in a rising tide of scientific manuscripts from paper mills—secretive businesses that allow researchers to pad their publication records by paying for fake papers or undeserved authorship. “Paper mills have made a fortune by basically attacking a system that has had no idea how to cope with this stuff,” says Dorothy Bishop, a University of Oxford psychologist who studies fraudulent publishing practices. A 2 May announcement from the publisher Hindawi underlined the threat: It shut down four of its journals it found were “heavily compromised” by articles from paper mills.
Sabel’s tool relies on just two indicators—authors who use private, noninstitutional email addresses, and those who list an affiliation with a hospital. It isn’t a perfect solution, because of a high false-positive rate. Other developers of fake-paper detectors, who often reveal little about how their tools work, contend with similar issues.
Still, the detectors raise hopes for gaining the advantage over paper mills, which churn out bogus manuscripts containing text, data, and images partly or wholly plagiarized or fabricated, often massaged by ghost writers. Some papers are endorsed by unrigorous reviewers solicited by the authors. Such manuscripts threaten to corrupt the scientific literature, misleading readers and potentially distorting systematic reviews. The recent advent of artificial intelligence tools such as ChatGPT has amplified the concern.
To fight back, the International Association of Scientific, Technical, and Medical Publishers (STM), representing 120 publishers, is leading an effort called the Integrity Hub to develop new tools. STM is not revealing much about the detection methods, to avoid tipping off paper mills. “There is a bit of an arms race,” says Joris van Rossum, the Integrity Hub’s product director. He did say one reliable sign of a fake is referencing many retracted papers; another involves manuscripts and reviews emailed from internet addresses crafted to look like those of legitimate institutions.
Twenty publishers—including the largest, such as Elsevier, Springer Nature, and Wiley—are helping develop the Integrity Hub tools, and 10 of the publishers are expected to use a paper mill detector the group unveiled in April. STM also expects to pilot a separate tool this year that detects manuscripts simultaneously sent to more than one journal, a practice considered unethical and a sign they may have come from paper mills. Such large-scale cooperation is meant to improve on what publishers were doing individually and to share tools across the publishing industry, van Rossum says.
“It will never be a [fully] automated process,” he says. Rather, the tools are like “a spam filter … you still want to go through your spam filter every week” to check for erroneously flagged legitimate content.
STM hasn’t yet generated figures on accuracy or false-positive rates because the project is too new. But catching as many fakes as possible typically produces more false positives. Sabel’s tool correctly flagged nearly 90% of fraudulent or retracted papers in a test sample. For every 56 true fakes it detected, however, it erroneously flagged 44 genuine papers, so results still need to be confirmed by skilled reviewers. Other paper mill detectors typically have a similar trade-off, says Adam Day, founding director of a startup called Clear Skies who consulted with STM on the Integrity Hub. But without some reliance on automated methods, “You either have to spot check randomly, or you use your own human prejudice to choose what to check. And that’s not generally very fair.”
Scrutinizing suspect papers can be time-consuming: In 2021, Springer Nature’s postpublication review of about 3000 papers suspected of coming from paper mills required up to 10 part- and full-time staffers, said Chris Graf, the company’s director of research integrity, at a U.S. House of Representatives subcommittee hearing about paper mills in July 2022. (Springer Nature publishes about 400,000 papers annually.)
Newly updated guidelines for journals issued in April may help ease the workload. They may decide to reject or retract batches of papers suspected of having been produced by a paper mill, even if the evidence is circumstantial, says the nonprofit Committee on Publication Ethics, which is funded by publishers. Its previous guidelines encouraged journals to ask authors of each suspicious paper for more information, which can trigger a lengthy back and forth.
Some outsiders wonder whether journals will make good on promises to crack down. Publishers embracing gold open access—under which journals collect a fee from authors to make their papers immediately free to read when published—have a financial incentive to publish more, not fewer, papers. They have “a huge conflict of interest” regarding paper mills, says Jennifer Byrne of the University of Sydney, who has studied how paper mills have doctored cancer genetics data.
The “publish or perish” pressure that institutions put on scientists is also an obstacle. “We want to think about engaging with institutions on how to take away perhaps some of the [professional] incentives which can have these detrimental effects,” van Rossum says. Such pressures can push clinicians without research experience to turn to paper mills, Sabel adds, which is why hospital affiliations can be a red flag.
Publishers should also welcome help from outsiders to improve the technology supporting paper mill detectors, although this will require transparency about how they work, Byrne says. “When tools are developed behind closed doors, no one can criticize or investigate how they perform,” she says. A more public, broad collaboration would likely strengthen them faster than paper mills could keep up, she adds.
Day sees some hope: Flagging journals suspected of being targeted by paper mills can quickly deter additional fraudulent submissions. He points to his analysis of journals that the Chinese Academy of Sciences (CAS) put on a public list because of suspicions they contained paper mill papers. His company’s Papermill Alarm detector showed that before the CAS list came out, suspicious papers made up the majority of some journals’ content; afterward, the proportion dropped to nearly zero within months. (Papermill Alarm flags potentially fraudulent papers based on telltale patterns revealed when a paper mill repeatedly submits papers; the company does not publicly disclose what these signs are.) Journals could drive a similar crash by using automated detectors to flag suspicious manuscripts, nudging paper mills to take them elsewhere, Day says.
Some observers worry paper mill papers will merely migrate to lower impact journals with fewer resources to detect them. But if many journals act collectively, the viability of the entire paper mill industry could shrink.
It’s not necessary to catch every fake paper, Day says. “It’s about having practices which are resistant to their business model.”