Alibaba Tongyi audio generation model FunAudioLLM is open source and supports emotional voice dialogue, audiobooks and other scenarios

Alibaba Tongyi's audio generation model FunAudioLLM is open source and supports scenarios such as emotional voice dialogue and audiobooks

Ali TongyiLaboratory recentlyOpen SourceNamedFunAudioLLMofLarge Model for Audio GenerationThe project aims to improve the natural voice interaction experience between humans and large language models (LLMs). The project consists of two core models: SenseVoice and CosyVoice.

CosyVoice focuses on natural speech generation, with multi-language support, timbre and emotion control functions, and excels in multi-language speech generation, zero-sample speech generation, cross-language sound synthesis and command execution. It supports five languages (Chinese, English, Japanese, Cantonese and Korean) through 150,000 hours of data training, can quickly simulate timbre and provide fine-grained control of emotion and rhythm.

SenseVoice is dedicated to high-precision multi-language speech recognition, emotion recognition, and audio event detection. It has been trained with 400,000 hours of data and supports more than 50 languages. Its recognition effect is better than the Whisper model, especially in Chinese and Cantonese, with an improvement of more than 50%. SenseVoice also has the ability to recognize emotions and detect sound events, as well as fast reasoning speed.

Alibaba Tongyi's audio generation model FunAudioLLM is open source and supports scenarios such as emotional voice dialogue and audiobooks

FunAudioLLM supports a variety of human-computer interaction application scenarios, such as multi-language translation, emotional voice conversations, interactive podcasts and audiobooks, etc. It enables seamless voice-to-voice translation, emotional voice chat applications, and interactive podcast radio stations by combining SenseVoice, LLMs, and CosyVoice.

In terms of technical principles, CosyVoice is based on speech quantization coding and supports natural and fluent speech generation, while SenseVoice provides comprehensive speech processing functions, including automatic speech recognition, language recognition, emotion recognition and audio event detection.

The open source models and codes have been released on ModelScope and Huggingface, and the training, inference, and fine-tuning codes are also available on GitHub. Both the CosyVoice and SenseVoice models have online experiences on ModelScope, allowing users to directly try out these advanced voice technologies.

Project address:https://github.com/FunAudioLLM

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.

{{userData.name}}Verify

Alibaba Tongyi's audio generation model FunAudioLLM is open source and supports scenarios such as emotional voice dialogue and audiobooks

The novel multimodal recommendation system paradigm DiffMM allows the diffusion model to recommend short videos!

Alibaba Cloud Wuying Cloud Computer announces the launch of Wuying Xiaoying, a computer-native AI assistant

AI Weibo

AI Applications

5000+ AI applications! Updated daily

AIAICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

{{userData.name}}Verify

Related content:

The novel multimodal recommendation system paradigm DiffMM allows the diffusion model to recommend short videos!

Alibaba Cloud Wuying Cloud Computer announces the launch of Wuying Xiaoying, a computer-native AI assistant

Alibaba open-sources 110 billion parameter Qwen1.5-110B model, comparable to Meta Llama3-70B

Hugging Face, the world's largest open source AI community, will provide $10 million in shared GPUs for free to help small businesses compete with large companies

The world's largest Oracle "dataset" is open source

Stable Diffusion3 open source commercial protocol, will open source a larger version of the model

AI Applications

5000+ AI applications! Updated daily

AIAICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow