A study conducted by the Reuters Institute suggests that by the end of 2023, nearly half (48%) of the world's popular news sites in 10 countries will block the OpenAI ofreptile(Crawler), while nearly a quarter (24%) blocked Google's AI crawler.
Image source: Pexels
According to IT House, the institute analyzed 15 of the most widely covered online news sources, including The New York Times, BuzzFeed News, The Wall Street Journal, The Washington Post, CNN and NPR, for robots.txt Documentation. These news organizations come from countries such as Germany, India, Spain, the United Kingdom and the United States, and cover three types of media: traditional print media, television broadcasters and digital native media.
The study found that by the end of 2023, more than half (57%) of traditional print media sites, such as The New York Times, blocked OpenAI's crawler, compared to 48% for TV and radio broadcasters and 31% for digital native media.Similarly, 32% of print media sites blocked Google's crawler, compared to 19% for broadcasters and 17% for digital native media. native media were 19% and 17% respectively.
Meanwhile, a recent Cornell University study found that when new AI models are trained using only previous models rather than human inputs, they tend to suffer from "model collapse" or degradation, leading to more errors and misinformation in the generated content.
Web crawlers are used for a variety of purposes. For example, Google's Googlebot crawls publisher websites to include them in search results. And OpenAI's crawler, GPTBot, collects data on the Internet and uses it to train its large-scale language models, such as ChatGPT. this enables AI tools to generate accurate, real-time content, which news publishers are particularly good at delivering: large-scale language models value high-quality publisher content 5 to 100 times more highly than content from other sources.
The study also noted that news organizations in countries in the Global North (which refers to wealthier countries mostly located in North America, Europe, and elsewhere) are more inclined to block AI crawlers than those in the Global South (which generally refers to developing countries including Africa, Latin America and the Caribbean, the Pacific Islands, and Asia). For example, in the U.S., 79% of popular online news sites blocked OpenAI, compared to 20% in Mexico and Poland, while in Germany 60% of news sites blocked Google's crawlers, compared to 7% in Poland and Spain.
The study found that almost all sites that blocked Google's crawler also blocked OpenAI (97%). While the study doesn't provide a definitive explanation, this could indicate that OpenAI had something to do with releasing the crawler before Google did.
Notably, in most countries, some publishers blocked the crawlers as soon as they were released. openAI launched its AI crawler in early August last year, and Google followed suit in September. The study also showed that none of the sites reversed their blocking of OpenAI or Google's AI crawlers once the decision to block was made.