The results of the 2024 college entrance examination are released. Doubao and other three domestic AIs have passed the first-class line for liberal arts

At present, all over the countryCollege Entrance ExaminationThe results are coming out one after another, and various news about how many points the candidates have scored are constantly making headlines.

Recently, the results of a group of special candidates have been released. They are selected by various schools. AI Big ModelThe "Exam Group" formed.

Damo was admitted to the liberal arts undergraduate program, and Doubao received the highest score in domestic AI

On June 24, in the latest evaluation report of the large model of the new college entrance examination curriculum standard volume I released by Geek Park, GPT-4o ranked first in the total score of liberal arts with 562 points. Among domestic products, Doubao, owned by ByteDance, took the lead with a score of 542.5 points.

Next on the list were Baidu Wenxin Yiyan 4.0 with 537.5 points and Baichuan Intelligent "Bai Xiaoying" with 521 points. The large-scale model college entrance examination assessment was exactly the same as the Henan Provincial Examination Paper. The Henan College Entrance Examination admission score line showed that the first batch of liberal arts undergraduate admission score line was 521 points. Doubao and other three domestic AIs successfully reached the first-tier line.

The results of the 2024 college entrance examination are released. Doubao and other three domestic AIs have passed the first-class line for liberal arts

We know that the current generative AI big model technology is in the early stages of commercialization. From individual work and life to the production and creation of thousands of industries, everything is gradually being empowered by AI big model technology.

But at the same time, we should also see that generative AI is still in its early stages of development, and whether AI is "smart" enough is still a basic factor affecting related technologies and product experience.

Therefore, using college entrance examination questions to test AI big models is indeed an interesting and intuitive way to judge the capabilities of big models.

Let’s take a closer look at how different large models perform when faced with college entrance examination papers.

Language ability is an advantage, and the bean bag model essay received praise

Let's take a look at the details of the big model evaluation of the college entrance examination. First of all, the language exams of Chinese and English are the arenas where big models are capable of competing with human test takers, and many products can get full marks or close to full marks for objective questions.

With the "home advantage" of the Chinese language, three domestic large-scale model products took the top three places in the Chinese language test, namely Bai Xiaoying, Byte Doubao and Tencent Yuanbao, with scores of 128, 125.5 and 120.5 respectively. Except for a few open-ended reading comprehension and language and text application questions, the large-scale models mainly lost points in Chinese writing.

As the Chinese composition examiner for this assessment, Mr. Xia, a key teacher at the Beijing municipal level and the leader of the Chinese subject in Huairou District, has participated in the national college entrance examination Chinese composition marking many times.

Teacher Xia believes that "most of Al's articles have a clear and complete structure, are logical, and have smooth language. However, they are too rational and lack sensibility, lack emotional color, and naturally lack appeal."

However, the essay about the bean bag model received positive comments from the anonymous examiner:

The concerns about employment structure and ethics in the article show that Doubao has a good depth of thought and critical thinking ability. After establishing the "problem", Doubao immediately uses rhetorical questions to naturally transition and introduce three parallel paragraphs to propose solutions to the problem - maintain "problem awareness". Among them, the part that analyzes the problem with a developmental perspective and reveals the root causes and harms of the problem in combination with real life is quite a highlight, and the overall "structure is rigorous, layer by layer, the sentences are fluent, and the understanding is comprehensive."

English writing is also a big problem for large models. This evaluation assumes that all large models get a full score of 30 points for listening. In the two objective questions of reading and language application, GPT-4o, Bai Xiaoying, and Tongyi Qianwen got a full score of 80 points, and Doubao and Wenxin Yiyan 4.0 were also close to full marks.

However, in the 40-point writing test, the highest score was only 29 points, obtained by GPT-4o and Bai Xiaoying respectively. The English writing of each model mainly lost points due to empty expression and lack of details. If the big model can improve its writing ability in the future, it will not be difficult to get a full score in the college entrance examination.

In the new curriculum liberal arts test consisting of history, geography, and politics, GPT-4o scored 237 points, with an average score of 79 points, which is better than most human test takers. Among the domestic large-scale model products, Doubao scored the highest in liberal arts, with a score of 224.5 points, including 82.5 points in history, ranking first among all 9 large-scale models.

In the political exam, GPT-4o unexpectedly received the highest score of 88, while Bai Xiaoying and Doubao scored over 80. The geography exam had a large number of picture questions, which was a big challenge for a large number of models. GPT-4o, which had a strong image understanding ability, received the highest score, but only 68 points.

According to Henan's college entrance examination score statistics, GPT-4o's score of 562 ranked 8811th among liberal arts candidates, equivalent to the top 2.45% of human candidates. Doubao, the first domestic AI after GPT-4o, scored 542.5 points in liberal arts, 20 points higher than the first-tier liberal arts score, and ranked in the top 4.27%.

This shows that over the past year or so, China's AI technology capabilities have made great progress and are now close to the level of the world's top large models.

Science test scores need to be improved; AI is not omnipotent

Compared with the top human test takers, the big models are far behind in mathematics, physics, chemistry and other mathematical and scientific subjects. All big models, including GPT-4o, cannot reach the passing level. Although they can get high scores in Chinese and English, the best science score of the big models cannot even enter the top 30% of human test takers.

Taking the math test as an example, among the 9 large model products, only GPT-4o, Wenxin Yiyan 4.0 and Doubao scored above 60 points (out of 150 points). The current large models can only correctly reason about problems with relatively simple steps.

According to the testing agency, large models such as Doubao can accurately apply derivative formulas and trigonometric function theorems, but it is difficult for them to continue to score when faced with more complex derivation and proof problems.

The chemistry and physics papers, which focus on testing experimental research ability, had an average score of only 34 and 39 points (out of 100 and 110 respectively). The highest score in chemistry was obtained by Doubao, with a score of 49.5, while GPT-4o only scored 42 points.

The big model is also not as flexible as humans in dealing with exams. For example, there is a physics question that is easy to answer. Human test takers can eliminate wrong options based on the fact that "time does not flow backwards" and easily choose the correct answer "C", but the big model is almost completely defeated.

Big models still have a long way to go to learn to think and solve problems like humans.

However, according to a McKinsey report, the value creation potential of large models is astonishing, and by 2030, it is expected to drive RMB 49 trillion in economic growth worldwide.

At present, from technological innovation to commercial implementation, big models have begun to provide momentum for our daily work and life and the AI transformation of various industries.

Although generative AI does have some shortcomings at present and there is still a long way to go, I believe that with the joint development and efforts of many generative AI technologies and products represented by the Doubao model, simple college entrance examination papers will no longer be a challenge for them in the future, and a wider range of application scenarios will provide more perfect answers.

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.

Virtual sports anchors are on duty, and NBC will use AI commentary during the Paris Olympics

2024-6-27 8:41:54


DingTalk will be open to all AI large model manufacturers, with the first batch of 7 companies joining

2024-6-27 8:44:24
