YouTuber sues OpenAI for $5 million for using video transcriptions without permission

YouTuber sues OpenAI for $5 million for using video transcriptions without permission

One Youtube The anchor filed a class action lawsuit with the U.S. District Court for the Northern District of California last Friday, alleging OpenAI The company scraped millions of videos without notifying or compensating the video owners. YouTube VideoScripts for training AI generative models.

The anchor is named David Millette from Massachusetts, USA. He accused OpenAI of grabbing videos of him and other anchor creators for training AI models. The products involved include ChatGPT, Sora, etc.

The class action lawsuit alleges that OpenAI collected the data and received “generous rewards,” but that this practice violated copyright law and YouTube’s terms of service.

Millett has currently entrusted Bursor & Fisher law firm to advance the class action lawsuit. The plaintiff requests a jury trial and demands more than $5 million (currently approximately RMB 35.683 million) in compensation from all YouTube users and creators whose data may have been involved in OpenAI training.

As we all know, generative AI models are not really intelligent. They learn the likelihood and patterns of data by processing large amounts of data samples (such as movies, recordings, papers, etc.). The training data for many models comes from public websites and data sets on the Internet. Although companies claim that their data crawling complies with the principle of "fair use", many copyright holders disagree and have filed lawsuits to stop this practice.

Video transcription content has become an important training data, especially as other data sources are exhausted. According to Originality.AI, more than 35% of the world's top websites have blocked OpenAI's web crawlers. In addition, research from MIT's Data Provenance Initiative shows that about 25% of high-quality data sources have been restricted, making the training data of AI models more scarce.

It is worth mentioning that OpenAI's Whisper model is specifically used to transcribe video audio to collect more training data. According to the New York Times, after transcribing more than one million hours of YouTube videos, the OpenAI team used these transcribed texts to train their GPT-4 model. This triggered internal discussions that this might violate YouTube's regulations.

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.
Information

Zhipu AI announces the open source of "Qingying" homologous video generation model - CogVideoX

2024-8-7 9:21:37

Information

Startup Placer.ai uses location data for AI market research, valuation soars to $1.5 billion

2024-8-7 9:38:21

Search