recently,Meta Quietly released a newWeb crawler, which is used to search the Internet and collect large amounts of data for itsArtificial Intelligence ModelProvide support.
According to three companies that track web scrapers,Meta New NetworkCrawler Robot Meta External Agent was launched last month. It is similar to OpenAI's GPTBot and can crawl artificial intelligence training data on the Internet.For example, the text in a news article or the conversation in an online discussion group.
Meta did update a company website for developers in late July with a tab indicating the existence of the new crawler, according to usage profile history, but Meta has yet to publicly announce its new crawler.
Meta’s Llama is one of the largest LLMs, and while the company did not disclose the training data used for the latest version of its model, Llama 3,But its initial version of the model used large datasets collected from other sources such as Common Crawl.
Earlier this year, Meta co-founder and CEO Mark Zuckerberg boasted on an earnings call that the company’s social platform had amassed a dataset for AI training that was “bigger than even Common Crawl.”
The existence of the new crawler suggests that Meta's massive database may no longer be sufficient.As the company continues to work on updating Llama and expanding Meta AI, new and high-quality training data is often needed to continually improve capabilities.
Data from Dark Visitors shows that nearly 25% of the world's most popular websites now block GPTBot, but only 2% of them block Meta's new crawler bot.