On February 12, the BBC (BBC) recently conducted a large-scale study aimed at evaluating artificial intelligence (AI)Chatbotsperformance in news summarization. The study involved a number of well-known AI tools such as Microsoft's Copilot, OpenAI's ChatGPT, Google's Gemini, and Perplexity. However, the results of the study showed that these tools have a lot of accuracy issues when generating news summaries.
For the study, the BBC asked these AI tools to summarize 100 news stories and ask content-related questions based on the summaries. The results showed thatMore than half of the answers generated by the AI had "significant problems.",About one-fifth of these answers introduced a clear factual errorDeborah Turness, chief executive of BBC News and Current Affairs, noted that "more than one in ten 'quotes' from BBC articles quoted by AI assistants are altered or simply do not exist in the original article. do not exist in the original article."
In addition, the study found thatAI assistant can't separate fact from opinion in news summaries, is also unable to discern between current and historically archived information in news stories, and is prone to mixing in subjective opinions when giving answers. Turness said, "The results generated by these AI tools are often a mix of questions that are far from the verified facts and clarity that consumers expect."
Notably, the BBC study also found that Microsoft's Copilot and Google's Gemini had more complex problems processing news summaries, and compared to ChatGPT and Perplexity, they performed poorly in distinguishing between opinion and fact, editorializing, and omitting key contextual information.
1AI notes that accuracy issues with AI tools are not limited to these chatbots. Apple also recently sparked controversy over its Apple Intelligence notification tool sharing incorrect headlines, leading it to temporarily disable the feature and be criticized by news outlets and liberty groups.
The BBC called for a moratorium on the use of AI-generated news summaries until an in-depth conversation with AI service providers can be held and a solution found. Ternes said, "We want to work together to find a solution."