亚马逊开发出有史以来最大的文本转语音模型，展现“涌现能力”

AmazonA team of artificial intelligence researchers has announced the development of what it says is the largestText-to-speech model,This model has the most parameters and uses the largest training dataset.The researchers have published a paper on the arXiv preprint server detailing the model's development and training process.

In recent years, "large language models" such as ChatGPT have attracted much attention for their ability to intelligently answer questions and generate advanced text. However, artificial intelligence is also gradually being integrated into other mainstream application areas. In this new project, researchers try to improve the capabilities of text-to-speech applications by increasing the number of parameters and expanding the training data set.

The new model, called Scalable Streaming Text-to-Speech (BASE TTS), has 980 million parameters and was trained using 100,000 hours of recordings (from public websites), most of which were in English. The researchers also provided the model with examples of words and phrases in other languages, enabling it to correctly pronounce some common expressions, such as "au contraire" and "adios, amigo."

The Amazon team also tested models using smaller datasets, hoping to discover what is known in the AI field as "emergent capabilities." This is the phenomenon where AI applications, whether large language models or text-to-speech models, suddenly break through to a higher level of intelligence. They found that for text-to-speech applications, this leap occurred on medium-sized datasets with 150 million parameters.

The researchers also noted that this leap involves a range of language attributes, such as the ability to use compound nouns, express emotions, use foreign words, apply phonetics and punctuation, and correctly emphasize key words in sentences.

The research team said that due to concerns about potential abuse, BASE TTS will not be open to the public. They plan to use it as a learning application and hope to apply what they learn to improve the overall sound quality of text-to-speech applications.

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.

Amazon develops largest text-to-speech model ever, demonstrating 'emergent capabilities'

Altman wants to raise $8 trillion to reshape the AI chip industry. "Silicon Sage": I will do it, and it can be done with less than $1 trillion

Headphones also support real-time translation. Samsung pushes Galaxy AI to Galaxy Buds 2/Pro in India

AI Weibo

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

Related content:

Altman wants to raise $8 trillion to reshape the AI chip industry. "Silicon Sage": I will do it, and it can be done with less than $1 trillion

Headphones also support real-time translation. Samsung pushes Galaxy AI to Galaxy Buds 2/Pro in India

Amazon reportedly invests millions of dollars to train giant AI model 'Olympus'

Amazon launches Amazon Q, a generative AI assistant that is powerful and easy to use

Amazon launches "Detective" project: AI's sharp eyes ensure that products are flawless before shipment

Amazon acquires AI startup Adept Technologies, founder joins team

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow