An analysis of scientific papers in the past decade shows that researchers have foundArtificial Intelligence ModelThere is an overuse of some "style" words that were rarely used a few years ago.
In a new study that has not yet been peer-reviewed, researchers used a novel approach, similar to epidemiology, to reveal how large language models tend to misuse certain words by analyzing "excessive word usage" in biomedical papers. The results provide interesting insights into the impact of artificial intelligence in academia, showing that at least 10% abstracts were processed using large language models in 2024.
Source Note: The image is generated by AI, and the image is authorized by Midjourney
The study was an extensive analysis of 14 million biomedical abstracts published in PubMed between 2010 and 2024. The researchers used papers published before 2023 as a benchmark, comparing them to papers published when large language models such as ChatGPT were widely used. They found that some words that were once considered "uncommon", such as "deep", are now used 25 times more frequently than in the past, while other words, such as "show" and "emphasize", have seen similar increases. However, some "common" words have also increased: words like "potential", "discovery" and "key" have increased in frequency by up to 4%.
The researchers note that this significant increase is essentially unprecedented without some urgent global event to explain it. They found that among the redundant words between 2013 and 2023, nouns closely related to real-life events appeared, such as "Ebola," "coronavirus," and "lockdown." However, among the redundant words in 2024, almost all of them were "style" words. In terms of quantity, of the 280 redundant "style" words in 2024, two-thirds were verbs and about one-fifth were adjectives.
Based on these redundant style words as "markers" used by ChatGPT, the researchers estimate that about 15% of papers published in non-English-speaking countries such as China, South Korea, and Taiwan are now processed by AI, compared to 3% in English-speaking countries such as the UK. Therefore, large language models may be an effective tool for non-native speakers to succeed in a field dominated by English.