OpenAI 开源 SimpleQA 新基准，专治大模型“胡言乱语”

October 31, 2010 - On October 30th, local time.OpenAI announced that in order to measureLanguage ModelThe accuracy of theOpen Sourcean organization called SimpleQA A new benchmark that measures the ability of language models to answer short fact-seeking questions.

One of the open challenges in AI is how to train models to generatefactually correctThe Answer. Current language models sometimesProduces incorrect output or unverified answersThis question is referred to as an "illusion". Language models that generate more accurate and less hallucinatory responses are more reliable and can be used in a wider range of applications.

OpenAI states that the goal is to use SimpleQA to create a dataset with the following characteristics:

High correctness:Reference answers to questions are verified by two independent AI trainers to ensure fairness in scoring.
Diversity:SimpleQA covers a wide range of topics, from science and technology to TV shows and video games.
Cutting edge challenging:Compared to earlier benchmarks such as TriviaQA (2017) or NQ (2019), SimpleQA is more challenging, especially for frontier models such as GPT-4o (e.g., GPT-4o scored less than 40%).
Efficient User Experience:SimpleQA questions and answers are concise and clear, allowing for fast and efficient operation and quick scoring via OpenAI APIs and more. In addition, SimpleQA with 4326 questions should have low variance in the assessment.

SimpleQA will be aSimple but challengingbenchmark for evaluating the factual accuracy of frontier models.The main limitation of SimpleQA is its scope - although SimpleQA is accurate, it only measures factual accuracy in the constrained setting of short queries that are fact-oriented and have a verifiable answer.

OpenAI says that whether the facticity exhibited by the model in short answers is related to itsPerformance in long, multi-factual contentRelated, this is still ahanging in the balanceIt is also a research topic of SimpleQA. It is hoped that SimpleQA's open source will further advance the development of AI research and make models more credible and reliable.

With relevant addresses:

Open Source Links:https://github.com/openai/simple-evals/
Thesis:https://cdn.openai.com/papers/simpleqa.pdf

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.

OpenAI Opens New SimpleQA Benchmark to Cure Big Models of "Nonsense"

OpenAI ChatGPT Advanced Voice Mode is now available on Windows and Mac platforms for more natural conversations.

Microsoft Github Launches Spark AI Tools: Lowering the Development Barrier, Everyday Language Descriptions Take Care of Entire Apps

AI Weibo

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

Related content:

OpenAI ChatGPT Advanced Voice Mode is now available on Windows and Mac platforms for more natural conversations.

Microsoft Github Launches Spark AI Tools: Lowering the Development Barrier, Everyday Language Descriptions Take Care of Entire Apps

Arcee AI releases open source language model Arcee-Nova: Based on Qwen2-72B, performance is close to GPT-4

OpenAI starts hiring in India to develop regulation sooner

ChatGPT 5 predictions may be here: release date, features and price

OpenAI ChatGPT AI Chatbot Adds '/picture' and '/search' Commands

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow