Before WWDC24,appleAn “efficient language model with an open source training and inference framework” was released on the Hugging Face platform, called OpenELM.
Of course, this is an open source language model, and its source code, pre-trained model weights, and training recipes are available in Apple's Github repository.
The official introduction is translated as follows:
Reproducibility and transparency of large language models are critical to advancing open research, ensuring trustworthiness of results, and investigating data and model biases and potential risks. To this end, we release OpenELM, a state-of-the-art open source language model.
OpenELM uses a layered scaling strategy to effectively distribute the parameters of each layer of the Transformer model, thereby improving accuracy. For example, when the number of parameters is about 1 billion, OpenELM improves the accuracy by 2.36% compared to OLMo, while the number of pre-training tokens required is only 50%.
Unlike previous practices of only providing model weights and inference code and pre-training on private datasets, our release includes a complete framework for training and evaluating language models on public datasets, including training logs, multiple checkpoints, and pre-training configurations.
We also released code to convert the model to the MLX library for inference and fine-tuning on Apple devices. This comprehensive release is intended to empower and consolidate the open research community and pave the way for future open research work.