Ali Tongyi Qianwen open source Qwen1.5-MoE-A2.7B model

Thousand Questions on TongyiThe team has launched the Qwen series ofThe first MoE model, named Qwen1.5-MoE-A2.7B. This model has only 2.7 billion activation parameters but performs as well as currentFirstThe Qwen1.5-MoE-A2.7B model has only 2 billion non-embedded parameters compared to the Qwen1.5-7B model. Compared to Qwen1.5-7B, Qwen1.5-MoE-A2.7B has only 2 billion non-embedded parameters, which is about one-third of the size of the original model. In addition, Qwen1.5-MoE-A2.7B reduces the training cost by 75% and improves the inference speed by a factor of 1.74 compared to Qwen1.5-7B.

Ali Tongyi Qianwen open source Qwen1.5-MoE-A2.7B model

The Qwen1.5-MoE model utilizes a specially designed MoE architecture. Unlike traditional MoE methods, Qwen1.5-MoE uses 64 finegrained experts and introduces new routing mechanisms DeepSeek-MoE and DBRX.This finegrained experts design aims to generate more experts without increasing the number of parameters. The Qwen1.5-MoE model performs well in terms of training cost and inference efficiency, with a performance close to that of theFirstThe 7B model.

The Qwen1.5-MoE-A2.7B model has 1.43 billion activation parameters and 200 million non-embedded parameters, and reduces the training cost by 75%.In experiments, the inference speed of Qwen1.5-MoE-A2.7B is improved by about 1.74 times when tested with a single NVIDIA A100-80G GPU.The Qwen1.5-MoE model has been open-sourced in the ModelScope community and can be downloaded and used directly.

In addition to performance and efficiency, the Qwen1.5-MoE model will continue to be updated with support for third-party frameworks, including llama.cpp, MLX, and others.

Overall, the Qwen1.5-MoE model achieves significant advantages in terms of performance, efficiency, and inference speed, and is the inference trainingoptimalOne of the practices.

Qwen1.5-MoELink to experience.

https://modelscope.cn/studios/qwen/qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4-demo

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.
Information

Zhou Hongyi calls himself an "open source believer": Announces that the 360 Brain 7B model will be open source, supporting 500,000-word long text input

2024-3-30 8:44:47

HeadlinesInformation

Musk suddenly released Grok 1.5! The context length soared 16 times and was on par with GPT-4

2024-3-30 8:47:24

Search