Meta deploys new web crawler bot to collect massive amounts of data for its AI models

recently,Meta Quietly released a newWeb crawler, which is used to search the Internet and collect large amounts of data for itsArtificial Intelligence ModelProvide support.

Meta deploys new web crawler bot to collect massive amounts of data for its AI models

According to three companies that track web scrapers,Meta New NetworkCrawler Robot Meta External Agent was launched last month. It is similar to OpenAI's GPTBot and can crawl artificial intelligence training data on the Internet.For example, the text in a news article or the conversation in an online discussion group.

Meta did update a company website for developers in late July with a tab indicating the existence of the new crawler, according to usage profile history, but Meta has yet to publicly announce its new crawler.

Meta’s Llama is one of the largest LLMs, and while the company did not disclose the training data used for the latest version of its model, Llama 3,But its initial version of the model used large datasets collected from other sources such as Common Crawl.

Earlier this year, Meta co-founder and CEO Mark Zuckerberg boasted on an earnings call that the company’s social platform had amassed a dataset for AI training that was “bigger than even Common Crawl.”

The existence of the new crawler suggests that Meta's massive database may no longer be sufficient.As the company continues to work on updating Llama and expanding Meta AI, new and high-quality training data is often needed to continually improve capabilities.

Data from Dark Visitors shows that nearly 25% of the world's most popular websites now block GPTBot, but only 2% of them block Meta's new crawler bot.

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.
Information

Baidu, SenseTime, and Zhipu are the top three. IDC releases the first report on the market share of large model platforms and applications

2024-8-22 9:16:33

Information

A U.S. mayoral candidate wanted to use ChatGPT to govern the city, but was banned by OpenAI

2024-8-22 9:18:17

Search