Voice actors in danger! Microsoft's VALL-E 2 model voice cloning reaches voice actor level

recently,MicrosoftPublished zero-shot text-to-speech (TTS) model VALLE-2 has attracted widespread attention in the technology community. This breakthrough achievement achieved human-level speech synthesis for the first time and is considered a milestone in the field of TTS.

Voice actor danger! Microsoft's VALL-E 2 model voice clone reaches voice actor level

Technical highlights and innovations:

Zero-sample learning: VALLE-2 only needs a short unfamiliar voice sample to imitate the same voice to speak any text content, demonstrating amazing instant imitation capabilities.

Repeat-aware sampling: Improved random sampling method, effectively alleviated the infinite loop problem and improved decoding stability.

Grouped Code Modeling: By grouping encoder-decoder codes, the sequence length is reduced, speeding up the inference process while improving performance.

Simplified training data requirements: VALLE-2 only requires simple speech-to-text transcription data for training, which greatly simplifies the data collection and processing process.

Performance evaluation: In terms of subjective scores (SMOS and CMOS) and objective indicators (SIM, WER and DNSMOS), VALLE-2 not only surpasses the previous model VALLE, but also outperforms real human speech in some aspects.

Ethical considerations and market responses:

Potential risks: VALLE-2's powerful voice imitation capabilities have raised concerns about the abuse of Deepfake technology.

Microsoft is cautious about this and currently positions VALLE-2 as a pure research project with no plans for productization. It has made an ethical statement on the project page and in the paper, emphasizing the necessity of synthetic speech detection and authorization mechanisms.

Some users expressed disappointment that Microsoft did not release a trial product. Industry insiders speculated that Microsoft might be avoiding potential risks and negative public opinion. As the technology matures and market competition intensifies, it may only be a matter of time before VALLE-2 or similar technologies are commercialized.

Technical limitations and room for improvement:

Demo limitations: Currently, the public demonstration samples are limited, making it difficult to fully evaluate the model performance.

Accent adaptability: The model's performance in handling non-British and American accents needs to be improved.

Computational efficiency: Despite improvements, there is still room for optimization in inference speed.

The emergence of VALLE-2 marks a new era for zero-sample TTS technology. It not only demonstrates the great potential of AI in the field of speech synthesis, but also triggers in-depth thinking about the ethics and responsible use of technology. As the technology further develops and improves, we can expect to see more innovative applications, but it also requires the industry, regulators, and the public to work together to ensure the responsible use of this powerful technology. In the future, VALLE-2 and similar technologies are likely to bring revolutionary changes in voice assistants, content creation, education and training, and will also promote the advancement of speech recognition and synthesis detection technology to address potential risks of abuse.

Project address: https://www.microsoft.com/en-us/research/project/vall-ex/vall-e-2/

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.

{{userData.name}}Verify

Voice actor danger! Microsoft's VALL-E 2 model voice clone reaches voice actor level

Meta’s latest AI model, Llama 3.1, is now available on Cloudflare Workers AI platform

OpenAI reorganizes internal structure, transfers AI safety chief to reasoning research

AI Weibo

AI Applications

5000+ AI applications! Updated daily

AIAICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

{{userData.name}}Verify

Related content:

Meta’s latest AI model, Llama 3.1, is now available on Cloudflare Workers AI platform

OpenAI reorganizes internal structure, transfers AI safety chief to reasoning research

UK regulator to review Microsoft's OpenAI collaboration

ChatGPT helps 5 million users file taxes! One of the world's largest tax agencies cooperates with Microsoft

Microsoft plans to integrate OpenAI's Sora video generation model into Copilot, but it will take time

Microsoft is working to cure AI hallucinations by using technology to block and rewrite unfounded information in real time

AI Applications

5000+ AI applications! Updated daily

AIAICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow