Thousand Questions on TongyiThe team has launched the Qwen series ofThe first MoE model, named Qwen1.5-MoE-A2.7B. This model has only 2.7 billion activation parameters but performs as well as currentFirstThe Qwen1.5-MoE-A2.7B model has only 2 billion non-embedded parameters compared to the Qwen1.5-7B model. Compared to Qwen1.5-7B, Qwen1.5-MoE-A2.7B has only 2 billion non-embedded parameters, which is about one-third of the size of the original model. In addition, Qwen1.5-MoE-A2.7B reduces the training cost by 75% and improves the inference speed by a factor of 1.74 compared to Qwen1.5-7B.
The Qwen1.5-MoE model utilizes a specially designed MoE architecture. Unlike traditional MoE methods, Qwen1.5-MoE uses 64 finegrained experts and introduces new routing mechanisms DeepSeek-MoE and DBRX.This finegrained experts design aims to generate more experts without increasing the number of parameters. The Qwen1.5-MoE model performs well in terms of training cost and inference efficiency, with a performance close to that of theFirstThe 7B model.
The Qwen1.5-MoE-A2.7B model has 1.43 billion activation parameters and 200 million non-embedded parameters, and reduces the training cost by 75%.In experiments, the inference speed of Qwen1.5-MoE-A2.7B is improved by about 1.74 times when tested with a single NVIDIA A100-80G GPU.The Qwen1.5-MoE model has been open-sourced in the ModelScope community and can be downloaded and used directly.
In addition to performance and efficiency, the Qwen1.5-MoE model will continue to be updated with support for third-party frameworks, including llama.cpp, MLX, and others.
Overall, the Qwen1.5-MoE model achieves significant advantages in terms of performance, efficiency, and inference speed, and is the inference trainingoptimalOne of the practices.
Qwen1.5-MoELink to experience.
https://modelscope.cn/studios/qwen/qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4-demo