Alibaba Cloud Tongyi Qianwen series AI open source model upgraded to Qwen2: 5 sizes, context length supports up to 128K tokens

Thousand Questions on Tongyi(Qwen) announced today that after months of hard work, the Qwen series models have been significantly upgraded from Qwen1.5 to Qwen2.And it has been synchronized on Hugging Face and ModelScopeOpen Source.

Alibaba Cloud Tongyi Qianwen series AI open source model upgraded to Qwen2: 5 sizes, context length supports up to 128K tokens

Attached is Qwen 2.0. The main contents are as follows:

  • Pre-trained and fine-tuned models in 5 sizes, including Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B, and Qwen2-72B

  • In addition to Chinese and English, high-quality data related to 27 languages has been added to the training data;

  • Leading performance on multiple benchmarks;

  • Significant improvement in coding and math skills;

  • Increased the context length support, up to 128K tokens (Qwen2-72B-Instruct).

Basic information of the model

The Qwen2 series includes pre-trained and instruction fine-tuned models of 5 sizes, including Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B, and Qwen2-72B.

Model Qwen2-0.5B Qwen2-1.5B Qwen2-7B Qwen2-57B-A14B Qwen2-72B
Parameter quantity 0.49B 1.54B 7.07B 57.41B 72.71B
Non-Embedding Parameters 0.35B 1.31B 5.98B 56.32B 70.21B
GQA True True True True True
Tie Embedding True True False False False
Context length 32K 32K 128K 64K 128K

In the Qwen1.5 series, only 32B and 110B models used GQA. This time, all models of all sizes use GQA so that everyone can experience the advantages of GQA's inference acceleration and reduced video memory usage.

Model Evaluation

Compared with Qwen1.5, Qwen2 has achieved a significant improvement in performance on large-scale models. We conducted a comprehensive evaluation of Qwen2-72B.

In the evaluation of pre-trained language models, compared with the current best open source models, Qwen2-72B significantly surpasses the current leading models such as Llama-3-70B and Qwen1.5's largest model Qwen1.5-110B in many capabilities including natural language understanding, knowledge, code, mathematics and multilingualism.

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.
Information

Challenging Nvidia and AMD, Kneron, a startup backed by Qualcomm and Foxconn, launches new NPU for notebooks

2024-6-7 9:48:57

Information

Microsoft launches Aurora, the first AI-based weather forecasting system that can also predict air pollution levels

2024-6-7 9:51:29

Search