To outperform OpenAI GPT-4, Meta spares controversial data to train Llama 3

Jan. 15, 2012 - On Tuesday, in a meeting involving the Meta In an AI copyright case (Kadrey v. Meta), the court made public records of internal communications between Meta executives and researchers. These documents show that Meta was developing its latest AI model Llama 3 The process ofExecutives and researchers will go beyond OpenAI s GPT-4 as a core goal and is extremely competitive in internal discussions.

To outperform OpenAI GPT-4, Meta spares Llama 3 training using controversial data

In an October 2023 message to researcher Hugo Touvron, Ahmad Al-Dahle, VP of Generative AI at Meta, said, "Honestly ...... our goal has to be GPT-4. We are about to have 64,000 GPUs! We have to learn how to build cutting edge technology and win this race."

While Meta has long been known for its open-source AI models, its AI team is clearly more focused on outperforming competitors who don't disclose their model weights, such as Anthropic and OpenAI. Meta executives and researchers consider Anthropic's Claude and OpenAI's GPT-4 to be benchmarks for the industry, and strive to achieve them as such.

In internal discussions.Meta appears dismissive of French AI startup MistralMistral is one of Meta's main competitors in the open source space, but Al-Dahle was blunt in his message, "Mistral is not worth it to us. We should be able to do better."

At a time when tech companies are racing to deliver cutting-edge AI models, internal communications from Meta further reveal the highly competitive mindset of its AI leadership. In multiple exchanges, Meta's AI leaders mentioned that they were "very aggressive" in acquiring the data needed to train the Llama model. One executive even said in an internal email, "Llama 3 is pretty much the only thing I care about."

However, this aggressive competitive strategy also raises legal questions. Prosecutors in the case have accused Meta executives of using copyrighted books for training in their rush to launch AI models, and Touvron noted in a release that the dataset used to train Llama 2 was "of poor quality" and discussed how optimizing the data source could improve Llama 3's performance. and discussed how to improve Llama 3's performance by optimizing the data source. Touvron and Al-Dahle then discussed the possibility of using the LibGen dataset, which contains copyrighted works from publishers such as Cengage Learning, Macmillan Learning, McGraw Hill, and Pearson Education. in a message, Al-Dahle asked:"Do we have the right dataset? Is there any data you'd like to use but can't for some stupid reason?"

1AI notes that Meta CEO Mark Zuckerberg has previously stated that he's working to close the performance gap between the Llama model and closed-source models from OpenAI, Google, and others. Internal sources indicate that there is a lot of pressure within the company to achieve this goal. In a July 2024 letter, Zuckerberg wrote: "This year, Llama 3 has been able to compete with state-of-the-art models and lead in some areas. Starting next year, we expect future Llama models to be the most advanced in the industry."

In April 2024, Meta officially released Llama 3. This open-source AI model is on par with closed-source models from Google, OpenAI, and Anthropic in terms of performance, and outperforms open-source models from Mistral. However, the data Meta used to train the model - despite Zuckerberg's alleged approval to use it - is of questionable copyright status and is facing scrutiny in multiple lawsuits.

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.

{{userData.name}}Verify

To outperform OpenAI GPT-4, Meta spares Llama 3 training using controversial data

Kimi Multimodal Image Understanding Model API Released, 1M tokens priced from $12

From searching for people to preventing stampedes, India's Jumbo Jumbo Festival introduces AI technology for the first time to guard 450 million pilgrims

AI Weibo

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

{{userData.name}}Verify

Related content:

Kimi Multimodal Image Understanding Model API Released, 1M tokens priced from $12

From searching for people to preventing stampedes, India's Jumbo Jumbo Festival introduces AI technology for the first time to guard 450 million pilgrims

OpenAI and Meta are about to release AI models with human-level reasoning capabilities, report says

The most powerful model Llama 3 is officially released and has reached the level of GPT4

Apple reportedly refuses to work with Meta to bring its AI chatbot to iOS 18 due to privacy concerns

OpenAI, Meta to Train Big Models in African Languages to Make AI Reach More People

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow