Claiming to be "comparable to human experts", Google Gemini 1.5 Pro Mathematics Edition "improves intelligence": MATH benchmark accuracy rate is 91.1%

GoogleThe company released a technical report last week, saying Gemini The 1.5 Pro model significantly improved its math scores after being trained in a specific area of math.And successfully solved some problems of the International Mathematical Olympiad.

Claiming to be "comparable to human experts", Google Gemini 1.5 Pro Mathematics Edition "improves intelligence": MATH benchmark accuracy rate is 91.1%

Google trained the Gemini 1.5 Pro model specifically for mathematical scenarios and tested it with the MATH benchmark, the American Invitational Mathematics Examination (AIME), and Google's internal HiddenMath benchmark.

According to Google, Math Gemini 1.5 Pro performs “on par with human experts” on math benchmarks, solving significantly more problems on the AIME benchmark than the standard, non-Math Gemini 1.5 Pro, and also achieving improved scores on other benchmarks.

Of the three examples shared by Google, two were solved by the math-specific Gemini 1.5 Pro, while one was incorrectly solved by the standard Gemini 1.5 Pro variant. These problems typically require solvers to recall basic math formulas from algebra and rely on their segmentation and other math rules to arrive at the correct answer.

In addition to the questions, Google also shared important details about the Gemini 1.5 Pro benchmarks, which show that Gemini 1.5 Pro is ahead of GPT-4 Turbo and Amazon's Claude in all five benchmark scores.

Google said that the mathematical derivative Gemini 1.5 Pro has a single sample MATH benchmark accuracy of 80.6%, and when sampling 256 solutions and selecting a candidate answer (rm@256), the accuracy reaches 91.1%.

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.
Information

OpenAI GPT-4o drives ChatGPT subscription service demand surge, mobile revenue soars

2024-5-21 9:31:00

Information

Zhipu open-sources the next-generation multimodal large model CogVLM2

2024-5-21 9:33:51

Search