Alibaba open-sources 110 billion parameter Qwen1.5-110B model, comparable to Meta Llama3-70B

AlibabaIt was announced recently thatOpen Source Qwen1.5 series first 100 billion parametersModel Qwen1.5-110B, the model is comparable to Meta-Llama3-70B in the basic ability evaluation and performs well in Chat evaluations, including MT-Bench and AlpacaEval 2.0.

Main content:

According to reports, Qwen1.5-110B is similar to other Qwen1.5 models and uses the same Transformer decoder architecture. It includes grouped query attention (GQA), which is more efficient during model reasoning.The model supports a context length of 32K tokens.At the same time, it is still multilingual, supporting English, Chinese, French, Spanish, German, Russian, Japanese, Korean, Vietnamese, Arabic and other languages.

Alibaba Qwen1.5-110B model was compared with the recent SOTA language models Meta-Llama3-70B and Mixtral-8x22B. The results are as follows:

Alibaba open-sources 110 billion parameter Qwen1.5-110B model, comparable to Meta Llama3-70B

The above results show that the new 110B model is at least comparable to the Llama-3-70B model in terms of basic capabilities. In this model, Alibaba did not make major changes to the pre-training method, so they believe that the performance improvement compared to 72BMainly comes from increasing the model size.

Alibaba also conducted a Chat evaluation on MT-Bench and AlpacaEval 2.0. The results are as follows:

Alibaba open-sources 110 billion parameter Qwen1.5-110B model, comparable to Meta Llama3-70B

Alibaba said that in a benchmark evaluation of two Chat models, compared to the previously released 72B model,110B performs significantly better.The consistent improvement in evaluation results suggests that a more powerful and larger base language model can lead to better Chat models, even without drastically changing the post-training approach.

Finally, Alibaba said that Qwen1.5-110B is the largest model in the Qwen1.5 series.It is also the first model in the series to have more than 100 billion parameters.It performs well against the recently released SOTA model Llama-3-70B and significantly outperforms the 72B model.

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.
HeadlinesInformation

Tsinghua University establishes School of Artificial Intelligence, with Turing Award winner Yao Qizhi as dean

2024-4-27 17:21:15

Information

Equipped with Spark AI big model, iFLYTEK will launch voice calendar product next month

2024-4-28 9:28:23

Search