Training data - ai, artificial intelligence, 1ai.net

All Tags

Study: Training Data Containing 0.001% of Misinformation Is Enough to "Poison" Medical AI Models

January 14, 2011 - A study from New York University has revealed the potential risks of large-scale language models (LLMs) in medical information training. The study shows that even training data containing as little as 0.001% of incorrect information can cause the model to output inaccurate medical answers. Data "poisoning" is a relatively simple concept: LLM is usually trained on large amounts of text, mostly from the Internet. By injecting specific information into the training data, it is possible for the model to treat this information as fact when generating answers. This approach doesn't even require direct access to the LLM itself, just the purpose...
Information
- 1.6k
2 months ago
Microsoft clarifies: it won't use users' Word and Excel data to train AI models

Microsoft Office is known for its Connected Experiences feature, which analyzes user-created content to provide design recommendations, editing suggestions, data insights, and more. However, 1AI notes that @nixCraft, a blogger at cybersecurity blog Cyberciti.biz, claims that Microsoft's Connected Experiences feature automatically grabs data from users' Word and Excel documents and uses it to train the company's AI models. What's even more troubling is that the feature is turned on by default,...
Information
- 2.4k
3 months ago
The AI industry faces the challenge of a "data wall": high-quality training data may be exhausted by 2028

Recently, the shortage of training data for large AI models has once again become the focus of media attention. The Economist magazine's latest article "AI companies will soon exhaust most of the Internet's data" has sparked widespread discussion in the industry. The article points out that as high-quality Internet data runs out, the AI field is facing the challenge of a "data wall." Research company Epoch AI predicts that all high-quality text data on the Internet will be exhausted by 2028, and machine learning data sets may exhaust all "high-quality language data" by 2026. This &qu…
Information
- 6.6k
7 months ago
OpenAI CTO: Not sure where Sora's training data came from

OpenAI recently launched the hot text-to-video generation model Sora, but the company's Chief Technology Officer (CTO) Mira Murati was vague in an interview with the Wall Street Journal and could not clearly explain the source of Sora's training data. During the interview, when the reporter directly asked Murati about the source of Sora's training data, she only used vague official language to prevaricate: "We use publicly available data and licensed data." When the reporter asked whether the specific source included YouTube videos, Murati...
Information
- 2.3k
1 year ago
ChatGPT and other models: By 2026, high-quality training data will be exhausted

MIT Technology Review once published an article on its official website stating that with the continued popularity of large models such as ChatGPT, the demand for training data is increasing. Large models are like a "network black hole" that constantly absorbs, and will eventually lead to insufficient data for training. The well-known AI research institute Epochai published a paper directly on the data training problem, pointing out that by 2026, large models will consume all high-quality data; by 2030-2050, they will consume all low-quality data; by 2030-2060, they will consume all image training data...
Information
- 2.6k
1 year ago

Study: Training Data Containing 0.001% of Misinformation Is Enough to "Poison" Medical AI Models

Microsoft clarifies: it won't use users' Word and Excel data to train AI models

The AI industry faces the challenge of a "data wall": high-quality training data may be exhausted by 2028

OpenAI CTO: Not sure where Sora's training data came from

ChatGPT and other models: By 2026, high-quality training data will be exhausted

AI Weibo

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow