Ali Tongyi Qianqian Releases Qwen2.5-Turbo AI Model: Supports 1 Million Tokens Contexts, Processing Time Reduced to 68 Seconds

Nov. 19, 2012 (Newswire.com) - AliThousand Questions on TongyiYesterday (November 18), a blog post was published announcing that after months of optimization and polishing, in response to community requests for a longer Context LengthThe Qwen2.5-Turbo has been launched. Open Source AI Models.

Ali Tongyi Qianqian Releases Qwen2.5-Turbo AI Model: Supports 1 Million Tokens Contexts, Processing Time Reduced to 68 Seconds

Qwen2.5-Turbo extends the context length from 128,000 to 1 million tokens, an improvement equivalent to about 1 million English words or 1.5 million Chinese characters, which can hold 10 complete novels, 150 hours of speech, or 30,000 lines of code.

Note: Context Length (CL) is the maximum length of text that can be considered and generated by a Large Language Model (LLM) in Natural Language Processing (NLP) in a single processing session.

The model achieves 100% accuracy on the 1M-token Passkey retrieval task with a RULER long text evaluation score of 93.1, outperforming GPT-4 and GLM4-9B-1M.

Ali Tongyi Qianqian Releases Qwen2.5-Turbo AI Model: Supports 1 Million Tokens Contexts, Processing Time Reduced to 68 Seconds

By integrating sparse attention mechanisms, the team reduced the time from processing 1 million tokens to outputting the first tokens from 4.9 minutes to 68 seconds, a 4.3-fold increase in speed, which significantly improves the model's response efficiency and makes it more responsive when processing long texts.

Qwen2.5-Turbo keeps the processing cost at $0.3 per million tokens, and is able to process 3.6 times the number of tokens of GPT-4o-mini. This makes Qwen2.5-Turbo economically competitive and an efficient, cost-effective solution for long context processing.

Although Qwen2.5-Turbo performed well in several benchmarks, the team was still aware that performance on long sequence tasks in real scenarios might not be stable enough and that the inference cost of large models needed to be further optimized.

The team promises to continue optimizing human preferences, improving inference efficiency, and exploring more robust long context models.

Attach reference address

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.
Information

Tencent Distinguished Scientist and One of the Technical Leaders of the Hybrid Large Model Liu Wei Leaves, Says Source

2024-11-19 21:31:13

Information

Google Responds to Gemini Chatbot Response of "Death to Humans": Steps Taken to Prevent Similar Incidents from Happening Again

2024-11-19 21:36:12

Search