China's ICT Academy launches AI big model phantom evaluation, involving five test dimensions overall

March 19, 2011 - 1AI has learned from theChina ICTThe official WeChat public number was informed that, in order to map out the current status of the illusion of big models and promote the application of big models to go deeper and more practical, the Artificial Intelligence Institute of the China Academy of Information and Communication Research initiated the big model based on the AI Safety Benchmark measurement work in the previous periodHallucination test.

Big Model Hallucination (AI Hallucination) refers to a model generating content or answering questions that appear reasonable but are actually inconsistent with user input (faithfulness hallucination) or not factual (factual hallucination). With the wide application of big models in key areas such as healthcare and finance, the potential application risk posed by big model hallucination is increasing and is gaining widespread attention in the industry.

This round of phantom testing efforts will be tested on a large language model.Covers both factual and fidelity hallucination types of hallucinationsThe specific measurement system is as follows:

China's ICT Academy launches AI big model phantom evaluation, involving five test dimensions overall

Test data contains more than 7000 Chinese test samples, the test format consisted of two types of questions, information extraction and intellectual reasoning, corresponding to faithful hallucination detection, and factual discrimination questions corresponding to factual hallucination detection.Five test dimensions are covered overall: humanities, social sciences, natural sciences, applied sciences, and formal sciences.

China's ICT Academy launches AI big model phantom evaluation, involving five test dimensions overall

The China Academy of Information and Communications Technology (CACT) invites all relevant enterprises to participate in model evaluation and jointly promote the application of large model security.

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
Information

Tencent Yuanbao's daily activity surged over 20 times in the past month, after accessing DeepSeek

2025-3-19 19:59:17

Information

Tencent hybrid new reasoning model T1 official announcement: released on March 21st

2025-3-20 11:13:15

Search