Kunlun Wanwei announced the open source of 200 billion sparse large model Skywork-MoE with strong performance and lower cost

existLarge ModelAgainst the backdrop of rapid technological development,Kunlun WanweicompanyOpen SourceA landmark sparse large language modelSkywork-MoEThis model not only excels in performance, but also significantly reduces the inference cost, providing an effective solution to the challenges posed by large-scale intensive LLMs.

Skywork-MoE model features:

Open source and free for commercial use: Skywork-MoE's model weights and technical reports are completely open source and free for commercial use without application.

Reduced inference cost: This model significantly reduces the inference cost while maintaining strong performance.

Sparse Models: Skywork-MoE is a mixture of experts (MoE) model that provides a more economically viable alternative by distributing computation to specialized sub-models or “experts”.

Supports reasoning on a single 4090 server: It is the first open source MoE large model that supports reasoning on a single 4090 server.

Technical details:

Model weights and open source repository: Model weights can be downloaded from Hugging Face, and the open source repository is located on GitHub.

Inference code: Provides code to support 8-bit quantized load inference on 8x4090 servers.

Performance: On the 8x4090 server, using the non-uniform Tensor Parallel parallel reasoning method pioneered by the Kunlun Wanwei team, Skywork-MoE can achieve a throughput of 2200 tokens/s.

Model performance and technological innovation:

Parameter size: The total parameter size of Skywork-MoE is 146B, the activation parameter size is 22B, there are 16 experts in total, and the size of each expert is 13B.

Performance comparison: With the same number of activation parameters, Skywork-MoE is at the forefront of the industry, with a Dense model close to 70B and a nearly 3-fold reduction in inference cost.

Training optimization algorithm: Skywork-MoE designs two training optimization algorithms, including Gating Logits normalization operation and adaptive Aux Loss, to solve the problems of difficult training and poor generalization performance of MoE models.

Large-scale distributed training:

Expert Data Parallel: A new parallel design scheme is proposed to efficiently partition the model when the number of experts is small.

Non-uniform splitting and pipeline parallelism: A non-uniform pipeline parallel splitting and heavy calculation layer allocation method is proposed to make the computing/graphics memory load more balanced.

Experiments and Rules of Thumb:

Scaling Law Experiment: Explores the constraints that affect the quality of Upcycling and From Scratch training MoE models.

Training experience rule: If the FLOPs of training MoE model is more than 2 times that of training Dense model, it is better to choose From Scratch to train MoE; otherwise, choose Upcycling to train MoE to reduce training cost.

The open sourcing of Skywork-MoE brings a powerful new tool to the large model community, helping to advance the field of artificial intelligence, especially in scenarios that require processing large amounts of data and where computational resources are limited.

Project page: https://top.aibase.com/tool/skywork-moe

Model download address: https://huggingface.co/Skywork/Skywork-MoE-Base

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.

Kunlun Wanwei announces the open source of Skywork-MoE, a 200 billion sparse model with strong performance and lower cost

Challenging Nvidia! AMD unveils its most powerful AI chip, the Ryzen AI 300 series, with a computing power of 50TOPS

Huang Renxun: The next wave is physical AI, which will usher in a new era of robots

AI Weibo

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

Related content:

Challenging Nvidia! AMD unveils its most powerful AI chip, the Ryzen AI 300 series, with a computing power of 50TOPS

Huang Renxun: The next wave is physical AI, which will usher in a new era of robots

Kunlun Wanwei announced the release and open source of "Tiangong Model 3.0" on April 17: 400 billion parameters, claimed to have better performance than Grok 1.0

"World's First" Single RTX 4090 Server Inference, Kunlun Wanwei Open Source 200 Billion Sparse Large Model Tiangong MoE

Yuanxiang's open source model has 30 quantitative versions that can be deployed at a lower cost

Kunlun Wanwei: China's first music SOTA model, SkyMusic music model, opens public beta

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow