After some anticipation,TinyLlamaThe project released a strikingOpen Source ModelThe project started last September, with developers working to train a small model on trillions of tokens. After some hard work and some setbacks, the TinyLlama team has now released the model. The model has 1 billion parameters and took about three epochs, or three cycles through the training data.
The final version of TinyLlama outperforms existing open source language models of similar size, including Pythia-1.4B, OPT-1.3B, and MPT-1.3B. This marks a milestone and opens up new possibilities for the field of language models.
Not only is this model small, but its superior performance makes it ideal for deployment on edge devices as it only takes up 637MB of storage space. Even more exciting is that TinyLlama can also be used to assist in the inferred decoding of larger models, which provides a more flexible solution for tasks that rely on large models.advancedA tutorial by Andrej Karpathy, Director of AI and now at OpenAI, was cited, highlighting the potential of TinyLlama in this area.
The TinyLlama team designed it to be a compact version of Meta's open source language model Llama2, and even has the same architecture and word segmenter. This means that it can be easily embedded into projects built on Llama, providing researchers and practitioners with an "attractive" platform for language model research. Despite its small size, TinyLlama has demonstrated a wide range of uses in multi-domain language model research.
In practical applications, Awni Hannun, a machine learning research scientist at Apple, fine-tuned TinyLlama for LoRA on an 8GB Mac Mini using MLX (Apple's open source training toolkit), which shows the flexibility and plasticity of this model in various scenarios. The team said, "With its compact architecture and excellent performance, TinyLlama can realize end-user applications on mobile devices and become a lightweight platform for testing innovative ideas related to language models."
With the release of TinyLlama, the team said they plan to release “improved versions” that include plans to expand its performance and versatility. This opens up more possibilities for future language model research.
This is also a smallAI ModelsSome companies have begun to focus on making relatively small but superior models to reduce the cost of hardware operation. Microsoft's Phi project is one of them, and its Phi-2 model exceeds the model 25 times in size, showing the potential of small models. Google also announced the launch of Gemini Nano, a small version of its new flagship base model, which is expected to be about 3.2 billion parameters in size.
These small models outperform by using synthetic data generated by larger models during training. This trend is driving innovation in the field of artificial intelligence and has enabled many small models to perform comparable to cutting-edge models like OpenAI’s GPT.
Project URL: https://github.com/jzhang38/TinyLlama