Beijing Zhipu Huazhang Technology Co., Ltd. recently announced that it willAPIThe interface is open to the public free of charge to promote the popularization and application of large-scale model technology.
The GLM-4-Flash model shows significant advantages in both speed and performance, especially in inference speed. By adopting optimization measures such as adaptive weight quantization, parallel processing technology, batch processing strategy and speculative sampling, it achieves a stable speed of up to 72.14 token/s, which is outstanding among similar models.
In terms of performance optimization, the GLM-4-Flash model uses 10TB of high-quality multilingual data in the pre-training stage, which enables the model to not only handle tasks such as multi-round dialogues, web page searches, and tool calls, but also supports long text reasoning, with a maximum context length of up to 128K. In addition, the model also supports 26 languages including Chinese, English, Japanese, Korean, German, etc., showing its strong multilingual capabilities.
In order to meet the specific needs of different users for models,Zhipu AIA model fine-tuning function is also provided to help users better adapt the GLM-4-Flash model to various application scenarios.
Interface address: https://open.bigmodel.cn/dev/api#glm-4