-
Dark Side of the Moon Kimi Open Source Moonlight: 3 Billion / 16 Billion Parameter Mixed Expert Models
February 24, 2011 - Kimi, Dark Side of the Moon, yesterday released a new technical paper "Muon Scalable for LLM Training" and announced the launch of "Moonlight": a 3 billion / 16 billion parameter hybrid expert model (MoE) trained on Muon. Mixed Expert Model (MoE) trained on Muon. Using 5.7 trillion tokens, it achieves better performance at lower floating point operations (FLOPs), thus improving the Pareto efficiency bound. Dark Side of the Moon says the team found that the Muon optimizer can be carefully tuned by adding weight decay,...- 3.2k