Thousand Questions on Tongyi(Qwen) announced today that after months of hard work, the Qwen series models have been significantly upgraded from Qwen1.5 to Qwen2.And it has been synchronized on Hugging Face and ModelScopeOpen Source.
Attached is Qwen 2.0. The main contents are as follows:
-
Pre-trained and fine-tuned models in 5 sizes, including Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B, and Qwen2-72B
-
In addition to Chinese and English, high-quality data related to 27 languages has been added to the training data;
-
Leading performance on multiple benchmarks;
-
Significant improvement in coding and math skills;
-
Increased the context length support, up to 128K tokens (Qwen2-72B-Instruct).
Basic information of the model
The Qwen2 series includes pre-trained and instruction fine-tuned models of 5 sizes, including Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B, and Qwen2-72B.
Model | Qwen2-0.5B | Qwen2-1.5B | Qwen2-7B | Qwen2-57B-A14B | Qwen2-72B |
---|---|---|---|---|---|
Parameter quantity | 0.49B | 1.54B | 7.07B | 57.41B | 72.71B |
Non-Embedding Parameters | 0.35B | 1.31B | 5.98B | 56.32B | 70.21B |
GQA | True | True | True | True | True |
Tie Embedding | True | True | False | False | False |
Context length | 32K | 32K | 128K | 64K | 128K |
In the Qwen1.5 series, only 32B and 110B models used GQA. This time, all models of all sizes use GQA so that everyone can experience the advantages of GQA's inference acceleration and reduced video memory usage.
Model Evaluation
Compared with Qwen1.5, Qwen2 has achieved a significant improvement in performance on large-scale models. We conducted a comprehensive evaluation of Qwen2-72B.
In the evaluation of pre-trained language models, compared with the current best open source models, Qwen2-72B significantly surpasses the current leading models such as Llama-3-70B and Qwen1.5's largest model Qwen1.5-110B in many capabilities including natural language understanding, knowledge, code, mathematics and multilingualism.