AI2up to dateReleased OpenLanguage Model(OLMo) framework is designed to promote research and experimentation in large-scale language models. By providing training code, models, and evaluation code on Hugging Face and GitHub, AI2 is committed to enabling academics and researchers to jointly study the science of language models, explore the impact of new pre-training data subsets on downstream performance, and study new pre-training methods and stability.
The first batch of models in the project include four 7B scale final variants corresponding to different architectures, optimizers and training hardware, and a 1B scale model, all trained on at least 2T tokens. This is a long-term plan.FirstAs the company continues to build out its product, it plans to continue releasing larger models, models with guidance tweaks, and more variants.
Each model is provided with complete training data, including code for generating training data, as well as AI2's Dolma and WIMBD for analyzing pre-trained data. In addition, complete model weights, training code, training logs, training metrics in the form of Weights & Biases logs, and inference code are also provided. More than 500 checkpoints in the training process for each model are also available as revisions on HuggingFace.
In creating a strong open model, AI2 learned from many other open and partially open models and used them as competitive benchmarks for OLMo. The technical report of the project mentioned that the OLMo7B model surpassed the OLMo7B model in aspects such as generation tasks or reading comprehension (such as truthfulQA).Llama2, but lags slightly behind on popular question answering tasks such as MMLU or Big-bench Hard.
For the 1B OLMo model, an analysis was performed using AI2’s Paloma and checkpoints available on GitHub to explore the relationship between the model’s performance in terms of language prediction and factors such as model size. AI2 emphasized that Paloma’s approach attempts to provide a more balanced representation of the many domains in which language models are used by sampling each domain evenly.
The OLMo framework adoptsup to dateMany trends in the literature, including not using bias (such as stability in PaLM), the SwigLU activation function used by PaLM and Llama, Rotary Positional Embedding (RoPE), and a modified version of the BPE base tagger of GPT-NeoX-20B, aim to reduce personally identifiable information.
This release is just the beginning of OLMo and the framework, and future work is planned to be launched in different scales, modalities, data sets, safety measures, and evaluation. AI2 encourages the use of the OLMo model, provides simple installation steps and usage examples, and says that in the future, it will launch features such as guided adjustment models, complete training logs, and wandb reports.
Blog URL: https://blog.allenai.org/olmo-open-language-model-87ccfc95f58