"World's First" Single RTX 4090 Server Inference, Kunlun Wanwei Open Source 200 Billion Sparse Large Model Tiangong MoE

Kunlun WanweiAnnounced todayOpen Source 2 Hundred Billion SparseLarge Model Skywork-MoE is an expansion of the intermediate checkpoint of the Skywork-13B model previously open sourced by Kunlun Wanwei. It is said to be the first open source trillion-dollar MoE large model that fully applies and implements the MoE Upcycling technology. It is also the first open source trillion-dollar MoE large model that supports reasoning using a single RTX 4090 server (8 RTX 4090 graphics cards).

According to reports, the open source Skywork-MoE model belongs to the Tiangong 3.0 R&D model series and is a medium-sized model (Skywork-MoE-Medium). The total number of parameters in the model is 146B, the number of activated parameters is 22B, there are 16 experts in total, each expert is 13B in size, and 2 experts are activated each time.

Tiangong 3.0 also trained two MoE models, 75B (Skywork-MoE-Small) and 400B (Skywork-MoE-Large), which are not included in this open source release.

According to official tests, with the same activation parameter size of 20B (inference calculation amount), Skywork-MoE's capability is close to that of a 70B Dense model, which reduces the model's inference cost by nearly 3 times. At the same time, the total parameter size of Skywork-MoE is 1/3 smaller than that of DeepSeekV2, achieving similar capabilities with a smaller parameter size.

"World's First" Single RTX 4090 Server Inference, Kunlun Wanwei Open Source 200 Billion Sparse Large Model Tiangong MoE

Skywork-MoE's model weights and technical reports are completely open source and free for commercial use without application. The links are as follows:

Model weights download:

https://huggingface.co/Skywork/Skywork-MoE-base

https://huggingface.co/Skywork/Skywork-MoE-Base-FP8

Model open source repository: https://github.com/SkyworkAI/Skywork-MoE

Model technical report: https://github.com/SkyworkAI/Skywork-MoE/blob/main/skywork-moe-tech-report.pdf

Model inference code: (supports 8 bit quantized load inference on 8 x 4090 servers) https://github.com/SkyworkAI/vllm

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.
Information

Google: Investment in technology infrastructure in Singapore has reached US$5 billion

2024-6-4 9:30:55

Information

Apple WWDC24 will be held on June 11th, multiple AI technologies of iOS 18 will be released

2024-6-4 9:33:50

Search