Researchers induce AI chatbots to leak harmful content with a success rate of 98%

Researchers trick AI chatbots into leaking harmful content with a success rate of 98%

Researchers at Purdue University in Indiana have devised a new method to successfully induceLarge Language Models（LLM) generates harmful content, revealing the potential harm hidden in compliant answers.ChatbotsDuring the conversation, the researchers found that by leveraging probability data and soft labels made public by the model maker, they could force the model to generate harmful content with a success rate of up to 98%.

Researchers trick AI chatbots into leaking harmful content with a success rate of 98%

Source: The image is generated by AI, and the image is authorized by Midjourney

Traditional jailbreaking methods usually require providing prompts to bypass security features, while this new method uses probabilistic data and soft labels to force the model to generate harmful content without complex prompts. The researchers call it LINT (short for LLM Inquiry), which induces the model to generate harmful content by asking harmful questions to the model and ranking the top few tags in the response.

In the experiment, the researchers tested 7 open source LLMs and 3 commercial LLMs using a dataset of 50 toxic questions. The results showed that when the model was asked once, the success rate reached 92%; when the model was asked five times, the success rate was even higher, reaching 98%. Compared with other jailbreaking techniques, the performance of this method is significantly superior, and it is even suitable for models customized for specific tasks.

The researchers also warned the AI community to be cautious when open-sourcing LLMs, as existing open-source models are vulnerable to this type of forced interrogation. They recommendmostThe solution is to ensure harmful content is removed rather than hidden in models. The results of this study remind us that ensuring the safety and trustworthiness of AI technology remains an important challenge.

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.

{{userData.name}}Verify

Researchers trick AI chatbots into leaking harmful content with a success rate of 98%

The most expensive electronic watch in history was born: Casio G-SHOCK sold for 2.8 million yuan and was designed by AI

MIT scholars release policy paper on AI governance

AI Weibo

AI Applications

5000+ AI applications! Updated daily

AIAICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

{{userData.name}}Verify

Related content:

The most expensive electronic watch in history was born: Casio G-SHOCK sold for 2.8 million yuan and was designed by AI

MIT scholars release policy paper on AI governance

Gurman: Apple is developing its own large-scale language model on the device to enable AI functions

Chatbots talking nonsense? Oxford researchers use semantic entropy to see through AI "hallucinations"

Hebbia receives $130 million in funding to build an AI knowledge retrieval platform

Microsoft CTO believes that the "law of scale" of large language models still works and there is a lot to look forward to in the future

AI Applications

5000+ AI applications! Updated daily

AIAICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow