Alibaba releases new voice model Qwen2-Audio, surpassing OpenAI Whisper

recently,AlibabaBased on its Qwen-Audio, it launched a new open sourceVoice Model Qwen2-Audio. This model not only performs well in speech recognition, translation, and audio analysis, but also achieves significant improvements in functionality and performance. Qwen2-Audio provides a basic version and a command fine-tuning version. Users can ask questions to the audio model through voice, and recognize and analyze the content.

For example, users can ask a woman to speak a paragraph, and Qwen2-Audio can determine her age or analyze her emotions; if a noisy sound is input, the model can analyze the various sound components in it. Qwen2-Audio supports multiple languages including Chinese, Cantonese, French, English and Japanese, which greatly facilitates the development of sentiment analysis and translation applications.

Product entrance: https://top.aibase.com/tool/qwen2-audio

Compared with the first generation of Qwen-Audio, Qwen2-Audio has been fully optimized in terms of architecture and performance. In the pre-training stage, this new model uses more natural language prompts to replace the previous complex hierarchical labels. This improvement makes the model more handy in understanding and responding to various tasks, and its generalization ability has also been significantly improved.

Qwen2-Audio's command-following ability has also been greatly improved, and it can understand user commands more accurately. For example, when a user issues a command to "analyze the emotional tendency in this audio", Qwen2-Audio can accurately judge the emotions contained in the audio. In addition, the model introduces two modes: voice chat and audio analysis, making the user's voice interaction more natural. In audio analysis mode, Qwen2-Audio can deeply analyze various types of audio and provide detailed and accurate analysis results.

To ensure that the model's output meets human expectations, Qwen2-Audio also introduces advanced techniques such as supervised fine-tuning and direct preference optimization. When interacting with humans, the model appears more natural and accurate.

In terms of performance testing, Qwen2-Audio performed well in multiple mainstream benchmarks, especially in speech recognition and translation accuracy, surpassing OpenAI's Whisper-large-v3. The performance of this new model has not only attracted widespread attention in the industry, but also heralded a new future for voice technology.

 

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
Information

Anthropic expands bug bounty program to test next-generation AI safety systems

2024-8-11 8:44:39

Information

Beihang University releases "Xiaohang" AI assistant: 200 PFlops computing power, 12PB storage capacity

2024-8-11 8:46:06

Search