Meta deploys new web crawler robot to collect large amounts of data for its AI models

Meta deploys new web crawler bot to collect massive amounts of data for its AI models

recently,Meta Quietly released a newWeb crawler, which is used to search the Internet and collect large amounts of data for itsArtificial Intelligence ModelProvide support.

Meta deploys new web crawler bot to collect massive amounts of data for its AI models

According to three companies that track web scrapers,Meta New NetworkCrawler Robot Meta External Agent was launched last month. It is similar to OpenAI's GPTBot and can crawl artificial intelligence training data on the Internet.For example, the text in a news article or the conversation in an online discussion group.

Meta did update a company website for developers in late July with a tab indicating the existence of the new crawler, according to usage profile history, but Meta has yet to publicly announce its new crawler.

Meta’s Llama is one of the largest LLMs, and while the company did not disclose the training data used for the latest version of its model, Llama 3,But its initial version of the model used large datasets collected from other sources such as Common Crawl.

Earlier this year, Meta co-founder and CEO Mark Zuckerberg boasted on an earnings call that the company’s social platform had amassed a dataset for AI training that was “bigger than even Common Crawl.”

The existence of the new crawler suggests that Meta's massive database may no longer be sufficient.As the company continues to work on updating Llama and expanding Meta AI, new and high-quality training data is often needed to continually improve capabilities.

Data from Dark Visitors shows that nearly 25% of the world's most popular websites now block GPTBot, but only 2% of them block Meta's new crawler bot.

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.

{{userData.name}}Verify

Meta deploys new web crawler bot to collect massive amounts of data for its AI models

Baidu, SenseTime, and Zhipu are the top three. IDC releases the first report on the market share of large model platforms and applications

A U.S. mayoral candidate wanted to use ChatGPT to govern the city, but was banned by OpenAI

AI Weibo

AI Applications

5000+ AI applications! Updated daily

AIAICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

{{userData.name}}Verify

Related content:

Baidu, SenseTime, and Zhipu are the top three. IDC releases the first report on the market share of large model platforms and applications

A U.S. mayoral candidate wanted to use ChatGPT to govern the city, but was banned by OpenAI

The New York Times accuses OpenAI, Google, and Meta of skirting legal boundaries for AI training data

MIT launches two AI models called "PRISM" to detect pancreatic cancer earlier

Meta builds two new data center clusters: containing more than 49,000 NVIDIA H100 GPUs, dedicated to training Llama3

Shocking the AI world! Llama 3.1 leaked: an open source behemoth with 405 billion parameters is coming!

AI Applications

5000+ AI applications! Updated daily

AIAICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow