Qwen2-Audio：千问系列的音频多模态模型无需文字即可语音交互

Alibaba CloudThe latest release is a large-scale audio language model called Qwen-Audio. The model can accept a variety of audio signal inputs and can perform audio analysis or directly answer voice commands, greatly improvingVoice InteractionExperience.

In this release, Qwen2udio provides two unique audio interaction modes: audio chat and audio analysis. Users can communicate with others without typing any text. Qwen2-Audio It can conduct voice exchanges and provide audio and text commands for analysis during the interaction to bring users a more convenient experience.

Qwen2-Audio can intelligently understand the content in the audio and respond appropriately to voice commands. For example, in an audio segment that contains sounds, multi-speaker conversations, and voice commands at the same time, Qwen2-Audio can directly understand the command and provide an interpretation and response to the audio.

In addition, DPO also optimizes the model's performance in terms of factuality and compliance with expected behavior. According to the evaluation results of AIR-Bench, Qwen2-Audio outperforms previous SOTAs such as Gemini-1.5-pro in tests focusing on audio-centric instruction tracking functions. Qwen2-Audio is open source and aims to promote the advancement of the multimodal language community.

It is understood that the Qwen2-Audio series will launch two models: Qwen2-Audio and Qwen-Audio-Chat, providing users with a richer audio interaction experience.

The researchers will conduct a comprehensive evaluation of the Qwen2-Audio model, assessing its performance on a variety of tasks without any task-specific fine-tuning. In terms of English automatic speech recognition (ASR) results, Qwen2-Audio showed higher performance compared to previous multi-task learning models.

Qwen2-Audio: The audio multimodal model of the Qianwen series enables voice interaction without text

In terms of Qwen2-Audio's chat capabilities, researchers measured its performance on the AIR-Bench chat benchmark (Yang et al., 2024), and Qwen2-Audio demonstrated state-of-the-art (SOTA) instruction tracking capabilities across speech, sound music, and mixed audio subsets. It shows substantial improvements over Qwen-Audio and significantly outperforms other LALMs.

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.

Qwen2-Audio: The audio multimodal model of the Qianwen series enables voice interaction without text

Apple, Nvidia and other technology companies were exposed for using YouTube videos to train AI without permission

OpenAI launches AI model GPT-4o mini, claiming to be the most powerful and cost-effective small model

AI Weibo

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

Related content:

Apple, Nvidia and other technology companies were exposed for using YouTube videos to train AI without permission

OpenAI launches AI model GPT-4o mini, claiming to be the most powerful and cost-effective small model

Alibaba Cloud Tongyi Qianwen 2.5 large model released, claiming to "surpass GPT-4 in many capabilities"

The results of the cooperation between Xiaomi XiaoAi and Alibaba Cloud Tongyi Model have been implemented in Xiaomi cars and other

Alibaba Cloud CTO Zhou Jingren: Tongyi open source model downloads exceed 20 million, firmly embrace open source

Alibaba Cloud Tongyi Qianwen open-sources two voice base models, with better recognition performance than OpenAI Whisper

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow