A research-technology firm has developed a new approach to help identify journal articles that originate from paper mills — companies that churn out fake or poor-quality studies and sell authorships.
The technique, described in a preprint posted on arXiv last month1, uses factors such as the combination of a paper’s authors to flag suspicious studies. Its developers at London-based firm Digital Science say it can help to identify cases in which researchers might have bought their way onto a paper.
Science’s fake-paper problem: high-profile effort will tackle paper mills
Previous efforts to detect the products of paper mills have tended to focus on analysing the content of the manuscripts. One online tool, for example, searches papers for tortured phrases — strange alternative turns of phrase for existing terminology produced by software designed to avoid plagiarism detection. Another tool, being piloted by the International Association of Scientific, Technical, and Medical Publishers (STM), flags when identical manuscripts are submitted to several journals or publishers at the same time.
An approach that instead analyses the relationships between authors could be valuable as paper mills become better at producing convincing text, says Hylke Koers, chief information officer at the STM, who is based in Utrecht, the Netherlands. “This is the kind of signal that is much more difficult to work around or outcompete by clever use of generative AI.”
Table of Contents
Paper mills are a growing problem for publishers — according to one estimate, around 2% of all published papers in 2022 resembled studies produced by paper mills — and in recent years publishers have stepped up efforts to tackle them.
As well as being of poor quality, often containing made-up data and nonsensical text, the articles that paper mills churn out are frequently padded with researchers who buy authorship on manuscripts already accepted for publication. Some paper mills claim to have brokered tens of thousands of authorships — including in journals that are indexed in respected databases, such as Web of Science and Scopus.
This can create unusual patterns of co-authorship and networks of researchers that are different from those in legitimate research, says Simon Porter, vice-president for research futures at Digital Science.
Multimillion-dollar trade in paper authorships alarms publishers
Under normal circumstances, “you would expect to find behaviour where a young researcher is publishing with their supervisor, and starts to branch out a little later and publish with other people”, Porter says. “You can see an evolution; it’s not a random network.”
This is not the case with paper-mill works. The technology that Porter and his colleagues developed searches for trends that indicate paper-mill activity. These include co-author networks composed of early-career researchers who suddenly have a spike in publications, and papers featuring several authors who have no publication history or a collection of collaborators who are unlikely to have worked together, such as authors from several locations or unrelated disciplines.
When they compared the new technique’s results with those of the Problematic Paper Screener, a tool that searches for tortured phrases and other red flags, Porter and colleagues identified a significant overlap. Around 10% of authors were directly flagged by both tools, their study found, and 72% of authors in the ‘author networks’ data set can be linked through co-authorship to those in the ‘tortured phrases’ data set.
Although paper mills have quickly evolved so that fewer papers with tortured phrases are being published, Porter thinks the companies will find it difficult to circumvent flagging by these tools while keeping their current business model.
Digital Science has posted the code underlying the technique online, and Porter says that publishers could begin using it straight away.
Joris Van Rossum, programme director at STM Solutions in Amsterdam, says his organization will consider adding the new technology to the STM Integrity Hub — a collection of resources and tools designed to help publishers to detect fraudulent papers. A tool called Signals, which is already on the hub, uses author networks as part of its analysis, he adds.
AI intensifies fight against ‘paper mills’ that churn out fake research
Chris Graf, research-integrity director at Springer Nature in London, says that obstacles remain, particularly in distinguishing between researchers who share a name and weeding out authors who are flagged erroneously. “We have found that there can be some challenges with data consistency in this context that mean this is not straightforward,” Graf says. “Very brilliant young researchers with a low cluster coefficient could show up as false positives, which is clearly far from ideal.” But he adds: “Having said that, we are exploring a lot of different options, and nothing is off the table.” (Nature’s news team is independent of Springer Nature, its publisher.)
Anna Abalkina, a sociologist at the Free University of Berlin who has been tracking paper-mill studies for years, says it’s a good idea to scrutinize author networks. “Paper mills definitely do have collaboration anomalies,” she says.
Abalkina warns, however, that our knowledge of paper mills’ business models and processes is limited. It is also difficult to prove that a published study is definitely the product of a paper mill, she notes, which makes it hard to use that as a reason for retraction.
Ultimately, “it’s going to take every trick in the book to be able to provide a convincing filter for paper mills”, Porter says. “It won’t just be one technique.”