It doesn’t matter if there is no authorization. Several AI companies bypass network standards to crawl news publishers’ website content

According to Reuters on Saturday, TollBit, a startup focusing on "content licensing," recently announced to the press that it has been working on a "content licensing" program.publisherissued a warning that several artificial intelligence companies arecircumventsPublishers use to block crawled contentCommon Network Standardsand use the crawl forTraining Generative AI Systems.

The news comes after AI search startup Perplexity Issued against the backdrop of a public dispute between and media outlet Forbes over the same web standard. Currently, there is an ongoing dispute between tech and media companies overThe Value of Content in the Age of Generative AIA broader debate is taking place.

Tollbit positions itself asdry AI CompaniesandPublishers willing to enter into major license agreements with themThe "matchmaker".

Forbes has accused Perplexity of being in an AI-generated summary of thePlagiarizing their storiesHowever, the formerNot labeledsources, and without permission from Forbes.

Also, Wired magazine published an investigative story last week and noted that Perpexity mayIt's bypassed.(A "Robots Exclusion Protocol" (set by the news publisher) or other program that blocks web crawlers.

It doesn’t matter if there is no authorization. Several AI companies bypass network standards to crawl news publishers’ website content

Image source: Pexels

claim to bein the name of More than 2,000 U.S. publishersThe News Media Alliance, a trade organization of the U.S. Department of State, also expressed concern about this behavior - the "no-crawl" or "no-capture" mechanisms that AI companies have put in place for publishers.robots.txt"Tools such as this one fall on deaf ears. If AI companies can't stop mass crawling," said Danielle Coffey, president of the organizationFailure to passProfit from valuable content, and no way for journalists toPayment of compensation. "

Tollbit said that Perplexity is not the only violator of the "no-crawl" mechanism on publishers' websites. According to its analysis, "a large number" of AI platforms have bypassed this mechanism, which sets a "no-crawl" policy for AI platforms to crawl their own content.whitelisting" - Indicates which parts of their site can be crawled.

"This means that AI platforms from multiple sources (not just one company) are choosing to bypass the robots.txt protocol to retrieve content from the site," TollBit writes, "and the more publisher logs we acquire, the more times this pattern appears."

A number of publishers, including The New York Times, have already filed suit for these infringementsSuing AI companies.. Other publishers have signed licensing agreements with AI companies, and AI companies are willing to pay for content, although the two sides often disagree on the value of the material. Many AI developers argue that they get content for freeNo laws have been violated..

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.
Information

Chatbots talking nonsense? Oxford researchers use semantic entropy to see through AI "hallucinations"

2024-6-24 9:17:29

Information

F1 plans to launch AI data robot "Statbot" with Amazon to provide personalized viewing experience

2024-6-24 9:19:13

Search