Apple, Nvidia and other technology companies were exposed for using YouTube videos to train AI without permission

According to Wired magazine, includingapple,NvidiaA number of tech giants, including Anthropic and Salesforce, have been found to be using thousands of unauthorizedYoutubevideos to train theirArtificial Intelligence Model, which has sparked serious copyright and ethical controversies.

The report reveals that the companies integrated subtitles from various YouTube videos into their AI training datasets. A wide range of creators were affected, including well-known bloggers MKBHD, MrBeast, and Jacksepticeye, stand-up comedians Stephen Colbert, John Oliver, and Jimmy Canmore, as well as educational channels such as MIT, Khan Academy, and Harvard, and mainstream media outlets such as the Wall Street Journal and NPR.

Apple, Nvidia and other technology companies were exposed for using YouTube videos to train AI without permission

Source Note: The image is generated by AI, and the image is authorized by Midjourney

The data was actually downloaded and compiled by a non-profit organization called Eleuther AI. The organization used the content as part of a large dataset they released called 'The Pile', which was originally intended to provide training material for small developers and academics. However, these datasets have subsequently been utilized by major tech companies.

It's worth noting that companies such as Apple did not download this data directly from YouTube, but instead used a dataset compiled by Eleuther AI. Technically, it is Eleuther AI that is in direct violation of YouTube's terms of use, not these tech companies.

This incident has sparked a discussion about the legality and ethics of the sources of AI training data. It highlights the importance of data copyrights and licenses for use in the rapidly evolving field of AI, as well as the inadequacy of existing laws and regulations in facing the challenges of these emerging technologies. At the same time, it also brings new thinking about the balance of rights and interests between creators, platforms and AI companies.

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.
Information

Mistral's new model Codestral Mamba is faster and can process text twice as long as GPT-4o

2024-7-18 8:54:42

Information

Qwen2-Audio: The audio multimodal model of the Qianwen series enables voice interaction without text

2024-7-18 9:06:10

Search