GLM technical team on March 14, 2023Open SourceChatGLM-6B was released, which attracted wide attention and recognition. ChatGLM3-6B was later released, and developers are looking forward to the open source of the fourth generation of GLM models. After nearly half a year of exploration, the GLM technical team launched the fourth generation of GLM series open source models: GLM-4-9B.
In terms of pre-training, GLM-4-9B introduced a large language model for data screening and obtained 10T of high-quality multilingual data, which is more than 3 times the data volume of ChatGLM3-6B. At the same time, FP8 technology was used for efficient pre-training, which increased the training efficiency by 3.5 times. In the case of limited video memory, the performance limit was explored and it was found that the performance of the 6B model was limited. Considering the video memory size of most users, the model size was increased to 9B, and the pre-training calculation amount was increased by 5 times.
The GLM-4-9B model has more powerful reasoning performance, longer context processing capabilities, multi-language, multi-modal and All Tools capabilities, including the basic version GLM-4-9B (8K), the conversation version GLM-4-9B-Chat (128K), the extra-long context version GLM-4-9B-Chat-1M (1M) and the multi-modal version GLM-4V-9B-Chat (8K).
GLM-4-9B capabilities include:
1. Basic capabilities: The comprehensive performance of the model in Chinese and English is 40% higher than that of ChatGLM3-6B;
2. Long text capability: The context is expanded from 128K to 1M tokens, which is equivalent to the length of 2 volumes of Dream of the Red Chamber or 125 papers;
3. Multi-language capability: supports 26 languages, the vocabulary size is expanded to 150k, and the encoding efficiency is improved by 30%;
4. Function Call Ability: Excellent performance on the Berkeley Function-Calling Leaderboard;
5. All Tools capability: The model can use external tools to complete tasks;
6. Multimodal capabilities: Multimodal models were introduced for the first time with remarkable performance.
Code:
Github:https://github.com/THUDM/GLM-4
Model:
huggingface:https://huggingface.co/collections/THUDM/glm-4-665fcf188c414b03c2f7e3b7
Modelscope Community: https://modelscope.cn/organization/ZhipuAI