The smarter the AI, the more likely it is to "make stuff up," study finds.

A new study finds that with large-scale language modeling (LLM) are becoming more powerful, they also seem to be getting more prone to making up facts rather than avoiding or refusing to answer questions they can't answer. This suggests that these smarter AI chatbots are actually becoming less reliable.

The smarter the AI, the more likely it is to "make stuff up," study finds.

Published in the journal Nature, the researchers examined some of the industry's leading commercial LLMs: OpenAI's GPT and Meta's LLaMA, as well as BLOOM, an open-source model created by the research group BigScience.

It was found that while the answers to these LLMs became more accurate in many cases, they were overall less reliable and gave a higher percentage of incorrect answers than the old model.

José Hernández-Orallo, a researcher at the Valencia Institute for Artificial Intelligence in Spain, told Nature, "Today, theThey can answer almost everything. That means more right answers, but also more wrong answers. "

Mike Hicks, a philosopher of science and technology at the University of Glasgow, had a harsher take on this, with Hicks (who was not involved in the study) telling Nature, "It seems to me like what we call nonsense, which is getting better and better at pretending to be knowledgeable."

In testing, the models were asked about a variety of topics ranging from math to geography, and were asked to perform tasks such as listing information in a specified order.Overall, the larger, more powerful models gave the most accurate answers, but performed poorly on the more difficult questions, where they were less accurate.

Some of the biggest "liars", according to the researchers, are OpenAI's GPT-4 and o1, but all the LLMs studied seem to follow this trend, with none of the LLaMA family of models able to achieve the accuracy of 60% for even the simplest problems.

And when asked to judge whether a chatbot's answer was accurate or inaccurate, theA small group of participants had 10% to 40% probability errors in judgment.

In short, studies have shown that the larger the AI models (in terms of parameters, training data, and other factors), the higher the percentage of incorrect answers they give.

According to the researchers, the easiest way to solve these problems is to make the LLM less eager to answer everything. according to Hernández-Orallo, "A threshold can be set.When the question is challenging, let the chatbot say 'No, I don't know'." But if chatbots are limited to answering only what they know, it could expose the limitations of the technology.

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.
Information

FTC Announces Crackdown on Companies "Misrepresenting AI Capabilities," Charges First Five Firms

2024-9-30 10:21:41

HeadlinesInformation

China Telecom AI Research Institute Completes the First Fully Localized Wankawansen Large Model Training, and TeleChat2-115B Is Open-Sourced to the Public

2024-9-30 10:33:11

Search