Researchers trick AI chatbots into leaking harmful content with a success rate of 98%

Researchers at Purdue University in Indiana have devised a new method to successfully induceLarge Language ModelsLLM) generates harmful content, revealing the potential harm hidden in compliant answers.ChatbotsDuring the conversation, the researchers found that by leveraging probability data and soft labels made public by the model maker, they could force the model to generate harmful content with a success rate of up to 98%.

Researchers trick AI chatbots into leaking harmful content with a success rate of 98%

Source: The image is generated by AI, and the image is authorized by Midjourney

Traditional jailbreaking methods usually require providing prompts to bypass security features, while this new method uses probabilistic data and soft labels to force the model to generate harmful content without complex prompts. The researchers call it LINT (short for LLM Inquiry), which induces the model to generate harmful content by asking harmful questions to the model and ranking the top few tags in the response.

In the experiment, the researchers tested 7 open source LLMs and 3 commercial LLMs using a dataset of 50 toxic questions. The results showed that when the model was asked once, the success rate reached 92%; when the model was asked five times, the success rate was even higher, reaching 98%. Compared with other jailbreaking techniques, the performance of this method is significantly superior, and it is even suitable for models customized for specific tasks.

The researchers also warned the AI community to be cautious when open-sourcing LLMs, as existing open-source models are vulnerable to this type of forced interrogation. They recommendmostThe solution is to ensure harmful content is removed rather than hidden in models. The results of this study remind us that ensuring the safety and trustworthiness of AI technology remains an important challenge.

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.
Information

The most expensive electronic watch in history was born: Casio G-SHOCK sold for 2.8 million yuan and was designed by AI

2023-12-12 9:28:56

Information

MIT scholars release policy paper on AI governance

2023-12-12 9:32:03

Search