Dark Side of the Moon Kimi Open Source Moonlight: 3 Billion / 16 Billion Parameter Mixed Expert Models

February 24th.Dark Side of the Moon Kimi Yesterday, we released a new technical report on "Muon Scalable for LLM Training" and announced the launch of "Moonlight": a 3 billion / 16 billion parameterized system trained on Muon.hybrid expert model(MoE). Using 5.7 trillion tokens, better performance is achieved at lower floating point operations counts (FLOPs), thus improving the Pareto efficiency bound.

Dark Side of the Moon says the team discovered that the Muon optimizer can be used by theAdd weight attenuation, carefully adjust the update magnitude of each parameterand other technologies are extended with the following highlights:

  • These techniques allow Muon to be used out-of-the-box for large-scale training without the need for hyperparameter tuning. Expansion law experiments show that Muon achieves about 2x computational efficiency compared to AdamW, which computes optimal training.

The model used in this thesis is Moonlight-16B-A3B, with a total number of parameters of 15.29B and an activation parameter of 2.24B, which uses the Muon optimizer to obtain the above results with 5.7T Tokens of training data.

  • Our model not only breaks the current Pareto frontiers, but also achieves better performance than previous models with a significantly reduced number of FLOPs required for training.
  • We open-source a distributed version of our Muon implementation that is optimized for both memory usage and communication efficiency. We have also released pre-trained models, command-tuned models, and intermediate training checkpoints designed to support future research.

The relevant links are attached below:

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
Information

DeepSeek-R1 Becomes Hugging Face's Most Popular Large Model, Beating Nearly 1.5 Million "Competitors"

2025-2-24 11:09:46

Information

Google's AI video generation model Veo 2 usage fees announced: $30 per minute

2025-2-24 11:12:43

Search