Yuanxiang XVERSE ReleasedChina's largest MoE open source model XVERSE-MoE-A36B.
The model has 255B total parameters and 36B activation parameters, and the official claim is that the effect can "roughly reach" more than 100B large model "cross-level" performance leap, while the training time is reduced by 30%, and the inference performance is improved by 100%, which makes the cost per token drop dramatically. At the same time, the training time is reduced by 30%, the inference performance is improved by 100%, and the cost per token is greatly reduced.
MoE (Mixture of Experts) hybrid expert modeling architecture, combining multiple segmented domain expert models into a single super-model inScale up models while keeping model performance maximized, and even reduce the computational cost of training and inferenceMoE has been used in a number of big models. Big models like Google Gemini-1.5, OpenAI's GPT-4, and Musk's xAI's Grok all use MoE.
In several reviews, Meta-Elephant MoE outperforms several similar models, including Skywork-MoE, a 100 billion MoE model in China, Mixtral-8x22B, a traditional MoE dominator, and Grok-1-A86B, a 314 billion parameter MoE open-source model, and so on.
Attached related links:
- Hugging Face: https://huggingface.co/xverse/XVERSE-MoE-A36B
- Magic Hitch: https://modelscope.cn/models/xverse/XVERSE-MoE-A36B
- Github: https://github.com/xverse-ai/XVERSE-MoE-A36B