MicrosoftThe company released the Phi-3.5 series AI Models,The most notable of these is the launch of the first Mixed Model of Expertise (MoE) version of the series, Phi-3.5-MoE..
The Phi-3.5 series released includes three lightweight AI models, Phi-3.5-MoE, Phi-3.5-vision, and Phi-3.5-mini, built on synthetic data and filtered public websites, with a 128K context window, all of which are now available on Hugging Face under the MIT license. IT Home has attached the relevant descriptions below:
Phi-3.5-MoE: the first hybrid expert model
Phi-3.5-MoE is the first model in the Phi family to utilize the Mixed Expert (MoE) technique. The model activated only 6.6 billion parameters in a 16 x 3.8B MoE model using 2 experts and was trained on 4.9T tokens using 512 H100s.
The Microsoft research team designed the model from scratch to further improve its performance. In standard AI benchmarks, Phi-3.5-MoE outperforms Llama-3.1 8B, Gemma-2-9B, and Gemini-1.5-Flash, and is close to the current leader, GPT-4o-mini.
Phi-3.5-vision: enhanced multi-frame image understanding
With a total of 4.2 billion parameters, Phi-3.5-vision uses 256 A100 GPUs trained on 500B markers and now supports multi-frame image understanding and inference.
Phi-3.5-vision has improved performance on MMMU (from 40.2 to 43.0), MMBench (from 80.5 to 81.9), and the document understanding benchmark TextVQA (from 70.9 to 72.0).
Phi-3.5-mini: lightweight and strong features
Phi-3.5-mini is a 3.8 billion parameter model, surpassing Llama3.1 8B and Mistral 7B, and even rivaling Mistral NeMo 12B.
The model was trained using 512 H100s on 3.4T tokens. With only 3.8B effective parameters, the model is competitive in multilingual tasks compared to LLMs with more effective parameters.
In addition, Phi-3.5-mini now supports 128K context windows, while its main competitor, the Gemma-2 series, only supports 8K.