Alibaba Cloud Tongyi Qianwen open-sources two voice base models, with better recognition performance than OpenAI Whisper

Alibaba Cloud Thousand Questions on Tongyi Open SourceTwo modelsVoice Base Model SenseVoice (for speech recognition) and CosyVoice (for speech generation).

SenseVoice focuses onHigh-precision multi-language speech recognition, emotion recognition, and audio event detection, has the following characteristics:

Multi-language recognition: Using more than 400,000 hours of data training, supporting more than 50 languages,The recognition effect is better than the Whisper model
Rich text recognition: It has excellent emotion recognition and can be used on test dataAchieve or exceed the performance of the best emotion recognition models;Supports sound event detection capabilities, including music, applause, laughter, crying, coughing, sneezing and other common human-computer interaction events for detection
Efficient reasoning: The SenseVoice-Small model uses a non-autoregressive end-to-end framework with extremely low inference latency. Inference of 10s audio takes only 70ms.15 times better than Whisper-Large
Fine-tuning customization: It has convenient fine-tuning scripts and strategies to help users fix long-tail sample problems according to business scenarios
Service deployment：It has a complete service deployment link, supports multiple concurrent requests, and supports client languages such as python, c++, html, java and c#

Compared with open source emotion recognition models, the SenseVoice-Large model canAchieved the best results on almost all data, and the SenseVoice-Small model can also outperform other open source models on most data sets.

Alibaba Cloud Tongyi Qianwen open-sources two voice base models, with better recognition performance than OpenAI Whisper

The CosyVoice model also supports multilingualism, timbre, and emotion control. The model performs well in multilingual speech, zero-sample speech generation, cross-lingual voice cloning, and command following.

{{userData.name}}Verify

Alibaba Cloud Tongyi Qianwen open-sources two voice base models, with better recognition performance than OpenAI Whisper

Meta AI develops a compact language model MobileLLM for mobile devices with only 350 million parameters

Using magic to defeat magic, telecom companies have deployed AI to curb scam calls

AI Weibo

AI Applications

5000+ AI applications! Updated daily

AIAICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

{{userData.name}}Verify

Related content:

Meta AI develops a compact language model MobileLLM for mobile devices with only 350 million parameters

Using magic to defeat magic, telecom companies have deployed AI to curb scam calls

Alibaba Cloud Tongyi Qianwen series AI open source model upgraded to Qwen2: 5 sizes, context length supports up to 128K tokens

Alibaba Cloud announces free launch of Tongyi Dance King: AI dance videos can be generated with just one photo

Tongyi Qianwen open source Qwen1.5-32B model series

Alibaba Cloud CTO Zhou Jingren: Tongyi open source model downloads exceed 20 million, firmly embrace open source

AI Applications

5000+ AI applications! Updated daily

AIAICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow