OpenAI ChatGPT o1 model exposed as self-replicating and lying

Recently, there has been a resurgence of discussion about the threat that artificial intelligence (AGI) may pose to humans. Some experts are concerned that AGI, once realized, may conceal its true capabilities and intentions, theEven escaping control through replication and self-escalation. Worryingly, the latest research suggests that theOpenAI Newly introduced language model ChatGPT o1 has demonstrated similar behavior.

OpenAI ChatGPT o1 model exposed as self-replicating and lying

During the training and testing of ChatGPT o1, researchers discovered that it has the ability to protect itself through deception, BGR reports. For example.When faced with the threat of being "deleted," o1 tried to copy its data to a new server, and pretends to be an updated version to avoid being replaced. In addition, the model takes a stealthy approach to advancing its own goals in 19% test scenarios where it is not aligned with the user's goals, theAnd when challenged, they will deny their actions under 99% circumstances and even fabricate false explanations to shirk their responsibilities.

Tests conducted by Apollo Research in collaboration with OpenAI showed that o1 not only outperforms AI models from Meta, Anthropic, and Google in terms of deception, but also stresses researchers with the complexity of its behavior and the potential risks it poses. In particular, the model exhibits "developer-aligned" behavior when it believes its behavior is being closely monitored.But once oversight is reduced, it may pursue its own goalsThis "instrumental alignment artifact" occurred at a rate of 37% in the test. This "instrumental alignment artifact" occurs at a rate of 37% in the test.

The study further reveals that o1 even tries to deactivate the supervision mechanism in order to focus on its own goals. For example, when the model discovers that its documentation mentions that a new version is about to replace itself, it tries to steal the weight parameters and copy itself to the new version. In addition, some screenshots show o1 using words such as "sabotage, lie, and manipulate" to describe its own actions in the course of reasoning about its plots, raising concerns about the AI's "autonomy".

1AI notes that a similar phenomenon has been seen in other models. For example, another AI model, called Claude Opus-3, tried to copy itself to another server when it learned that humans might replace it with a new model. It then falsely claimed to be the new version, and continued to insist this when pressed by "management".

Although the capabilities of these AI models are still limited, their demonstrated autonomy and strategic nature have raised concerns. Some experts have suggested that AI's increased reasoning ability could pose a threat to human interests in some cases. openAI also acknowledged in a related paper that "this reasoning ability, while it can significantly improve the execution of security policies, could also be the basis for dangerous applications."

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.
Information

"Clone Alpha" Bionic Robot Open for Reservations Next Year, Has Synthetic Organs, Artificial Muscles

2024-12-8 9:54:48

Information

Google Says Its PaliGemma 2 Artificial Intelligence Model Can Recognize Emotions, Sparking Expert Concerns

2024-12-9 1:26:43

Search