Ali Tongyi Thousand Questions Open Source Visual Reasoning Model QVQ-72B-Preview: Think Like a Physicist

AliThousand Questions on Tongyi The Qwen team published a blog post today (December 25) announcing the launch of QVQ-72B-Preview, based on the Qwen2-VL-72B build Open Sourcevisual inference model,Be able to find solutions to complex physics problems through logical reasoning in a calm and collected manner, just like the masters of physics.

Ali Tongyi Thousand Questions Open Source Visual Reasoning Model QVQ-72B-Preview: Think Like a Physicist

Ali Tongyi Thousand Questions team evaluates QVQ-72B-Preview on 4 datasets, 1AI attached the relevant introduction below:

  • MMMU: A university-level, multidisciplinary, multimodal assessment set designed to examine integrated understanding and reasoning skills related to model vision.
  • MathVista: a collection of math-related visual reasoning tests that assesses the ability to reason logically with puzzle test graphs, algebraically with function graphs, and scientifically with academic paper graphs.
  • MathVision: a collection of high-quality multimodal mathematical reasoning tests from real math competitions, with more question diversity and subject breadth than MathVista.
  • OlympiadBench: an Olympiad-level bilingual multimodal science benchmark test set containing 8,476 problems from the Olympiad math and physics competitions, including the Chinese Gaokao. Each problem is accompanied by expert-level annotations detailing step-by-step reasoning.

Test results show that QVQ-72B-Preview achieved a score of 70.3 on the MMMU benchmark, significantly outperforming Qwen2-VL-72B-Instruct. additionally, the model performed well in the three remaining benchmarks focused on math and science problems, effectively closing the gap with the leading state-of-the-art o1 model.

Ali Tongyi Thousand Questions Qwen team also stated that QVQ-72B-Preview is an experimental research model focused on enhancing visual reasoning. Although it performed beyond expectations, there are still several limitations to be aware of:

  • Language mixing and switching: The model may accidentally mix languages or switch between languages, thus affecting the clarity of the response.
  • Recursive reasoning: the model may fall into a circular logic pattern, generating lengthy responses without reaching a conclusion.
  • Security and Ethical Considerations: Models require enhanced security measures to ensure reliable and safe performance, and users should exercise caution when deploying them.
  • Performance and Benchmark Limitations: Although the model has improved in visual reasoning, it cannot fully replace the capabilities of the Qwen2-VL-72B. In addition, during multi-step visual reasoning, the model may gradually lose focus on the image content, leading to hallucinations.

refer to

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.
Information

Google revealed to be using Anthropic's Claude model to improve its Gemini AI

2024-12-25 10:03:31

Information

The dark side of AI search, where hidden content can manipulate ChatGPT results

2024-12-25 17:59:30

Search