Depend onTsinghua UniversityThe latest March 2024 edition of the "SuperBench Large Model Comprehensive Capability Evaluation Report" released by the Basic Model Research Center covers a total of 14 models with influence at home and abroad.
In this evaluation,The performance of Wenxin 4.0 is remarkable. Its performance is close to the top international models, and the gap with the top models is gradually narrowing. It can be said to be a leading model in China.
For example,In the evaluation of human alignment capabilities, Wenxin 4.0 ranked first in the country with its outstanding performance.
In the Chinese reasoning and Chinese language tests, Wenxin 4.0 is far ahead, with a significant gap compared to other models. In Chinese comprehension, Wenxin 4.0's leading advantage is particularly prominent, 0.41 points higher than the second-place GLM-4.
In the mathematical ability assessment of semantic understanding, Wenxin 4.0 and Claude-3 tied for first place in the world, while the GPT-4 series models ranked fourth and fifth. The scores of other models were mainly concentrated around 55 points, significantly lagging behind the leading group.
In the assessment of reading comprehension ability, Wenxin 4.0 surpassed GPT-4 Turbo, Claude-3 and GLM-4, achieving the highest score.
In the security assessment that companies are most concerned about, Wenxin 4.0 also performed well, surpassing the world-class GPT-4 series models and Claude-3, and won the highest score (89.1 points), while Claude-3 ranked only fourth.
The data also shows that since March 16 last yearA Word from the HeartSince its debut, the number of users has exceeded 200 million, and the number of API calls per day has exceeded 200 million.