NvidiaIn a blog post on August 21st, the Mistral-NeMo-Minitron 8B small-language AI model was released, with the advantages of high accuracy and computational efficiency.The model can be run on GPU-accelerated data centers, clouds, and workstations.
NVIDIA and Mistral AI released last monthOpen Source The Mistral NeMo 12B model, on which NVIDIA is building, is back with the smaller Mistral-NeMo-Minitron 8B model, with 8 billion parameters, which can be run on workstations with NVIDIA RTX graphics cards.
NVIDIA says that Mistral-NeMo-Minitron 8B was obtained by width-pruning Mistral NeMo 12B and mildly retraining it with knowledge distillation, as published in Compact Language Models via Pruning and Knowledge Distillation".
Pruning shrinks a neural network by removing the model weights that contribute the least to accuracy. In the "distillation" process, the team retrained the pruned model on a small dataset to significantly improve the accuracy that had been reduced through the pruning process.
For its size, the Mistral-NeMo-Minitron 8B leads the pack in nine popular benchmarks for language modeling. These benchmarks cover a wide range of tasks, including language comprehension, common-sense reasoning, mathematical reasoning, summarization, coding, and the ability to generate authentic answers.