Reddit recently announced that it would be tightening its data protection measures to directly target AI companies and other data-scraping tools. The move marks a growing tension between social media platforms and the AI industry.
Reddit plans to update its robot exclusion protocol (robots.txt file) to prevent unauthorized automated crawling of the platform. A company spokesperson emphasized that the update is not company-specific, but is intended to "protect Reddit while keeping the Internet open", and Reddit said the changes would not affect "good faith actors" such as the Internet Archive and researchers.
Source Note: The image is generated by AI, and the image is authorized by Midjourney
The move appears to be a response to recent reports of AI CompaniesThe response to reports that AI companies, such as Perplexity, are bypassing the website's robots.txt protocol, which Perplexity's CEO once called "not a legal framework" in an interview with Fast Company, has sparked controversy over AI companies' data-acquisition practices.
Reddit's position is clear: any company using automated proxies to access its platform must comply with its terms and policies and communicate with Reddit. This may be a hint that Reddit wants to establish licensing agreements with AI companies similar to the ones it has with Google and OpenAI.
This isn't the first time Reddit has taken a hard line on data access. Last year, the company began charging AI companies for API usage and struck licensing deals with some AI companies to allow them to train models using Reddit's data. These agreements have become an important source of revenue for Reddit.
Reddit's move reflects the social media platform's balance between protecting user-generated content and seeking new revenue models. As AI technology rapidly evolves, similar data access controversies could play out on other platforms, sparking a broader discussion about data ownership, usage and value distribution.