according toThe New York TimesReport,OpenAI,Googleand Meta Accused of trainingArtificial Intelligence ModelThere is inappropriate behavior.
The New York Times report states that OpenAI used a speech recognition tool called Whisper to transcribe audio from YouTube videos, and that OpenAI employees allegedly discussed how this action might violate the video site's rules. OpenAI ultimately transcribed more than 1 million hours of YouTube videos, assisted by OpenAI President Greg Brockman, and these transcriptions were used to train the GPT-4 model.
Source Note: The image is generated by AI, and the image is authorized by Midjourney
The report also said that Meta had considered acquiring publisher Simon & Schuster to obtain long-form works for training AI, and also discussed "collecting copyrighted data from the Internet, even if it might face litigation", and believed that "negotiating licenses with publishers, artists, musicians and the news industry would take too long." Google was accused of transcribing YouTube videos to obtain text for AI model training, which the New York Times said "probably" violated the copyright of the videos, and said that Google modified its terms to allow data scraping of publicly available Google documents, restaurant reviews on Google Maps, and other online content for training AI.
The New York Times seems to be trying to paint a dire picture of mass infringement, but generally avoids saying so directly. These are reasonable conversations that any company developing AI should have in order to treat others well and comply with the law. AI companies are doing exactly that, using data fairly, which is at the heart of OpenAI’s defense against the New York Times lawsuit. The story didn’t reveal that the New York Times was suing OpenAI until 17 paragraphs later, making the article seem like an attack on what the company considers to be an enemy.
The New York Times report has sparked discussion about the legality and ethics of AI companies’ training data, and has also highlighted the challenges and controversies the AI industry faces in data acquisition.