Tech giants exposed for using YouTube content to train AI without authorization, including Apple and Nvidia

According to Wired, includingappleSome technology giants, including Youtube The video creators agreed to use their video subtitle files to trainArtificial Intelligence Model.

The creators affected by this incident include well-known technology bloggers MKBHD (Marques Brownlee), MrBeast, PewDiePie, and talk show hosts Stephen Colbert, John Oliver and Jimmy Kimmel. These subtitle files used to train AI are equivalent to the text transcription of the video.

Investigative journalists have revealed that some of the world’s richest tech companies have been using footage from thousands of YouTube videos to train AI, in violation of YouTube’s rules against scraping content from the platform without permission. More than 173,000 YouTube video subtitle files from 48,000 channels were used to train AI models,These include Apple,Nvidia, Salesforce and other Silicon Valley giants.

According to reports, the subtitle files were downloaded by a non-profit organization called EleutherAI, which claims that its purpose is to help developers train AI models. Although EleutherAI's original intention may be to provide training materials for small developers and academic researchers, the dataset is also used by technology giants such as Apple.

According to a research paper published by EleutherAI, this dataset is part of a larger dataset called "The Pile" that they released. Most of the datasets in "The Pile" are public and can be accessed by anyone with enough storage space and computing power. In addition to technology giants, some academics and developers have also used the dataset. However, companies with a market value of tens or even hundreds of billions of dollars, such as Apple, Nvidia, and Salesforce, have also mentioned in their research papers and posts how they use the dataset to train AI models.

Documents show thatApple used “The Pile” to train its much-anticipated OpenELM model a few weeks before releasing it in April.The release of the OpenELM model coincides with Apple’s announcement that it will add new AI features to iPhones and Macbooks.

It should be noted thatApple did not download the data itself, but EleutherAI did.So technically, it was EleutherAI that violated YouTube's terms of use.

While Apple and other companies may have used publicly available datasets, the incident highlights the legal risks of scraping data from the web to train AI systems. There have been cases of AI systems plagiarizing entire paragraphs of text when answering niche questions, and when companies use datasets compiled by third parties, it only increases the risk of using material without permission.

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.

Tech giants exposed for using YouTube content to train AI without authorization, including Apple and Nvidia

UK regulators are investigating Microsoft's dealings with Inflection AI

A smart feeder that can name backyard birds uses AI to identify individual birds

AI Weibo

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

Related content:

UK regulators are investigating Microsoft's dealings with Inflection AI

A smart feeder that can name backyard birds uses AI to identify individual birds

Apple, Nvidia and other technology companies were exposed for using YouTube videos to train AI without permission

Apple clarifies: YouTube subtitle data is not used for Apple Intelligence, OpenELM is only for research purposes

Apple executive: Mac is the best AI computer you can buy

Apple reportedly poached a large number of Google's top talents to build a mysterious artificial intelligence laboratory

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow