Remember the Chinese voice AI ceiling I recommended to you before?ChatTTSThis text-to-speech project, which can replace GPT-4o, became popular as soon as it was launched.GitHubIt has gained 16.9K stars.
Now, ChatTTS has officially launched its official website, and all users can experience it directly online.
Main functions:
Text-to-speech: Enter text in the text box, and ChatTTS will generate the corresponding speech and automatically adjust the rhythm and pauses.
Real-time voice conversation: Combined with the large language model, real-time voice conversation function is realized.
Adjust the timbre: At "Audio Seed", you can adjust the timbre of the speaker specified by number, or generate a random timbre by rolling the dice.
Control details: Users can add special markers such as [laugh] and [uv_break] to the text to manually control effects such as laughter and pauses.
ChatTTS’s Outstanding Features
Multi-language support: ChatTTS not only supports Chinese, but also can generate natural and fluent English voice. The mixed Chinese and English voices are excellent, and there is almost no trace of AI generation.
Fine-grained control: ChatTTS allows users to control laughter, pauses between speeches, and interjections, making the generated speech more natural and vivid.
Multi-speaker support: ChatTTS supports multi-speaker speech synthesis and can reproduce various voices, including classic voices of deceased figures.
Large-scale training data: The largest ChatTTS model used more than 100,000 hours of Chinese and English data for training. The open source version of HuggingFace used 40,000 hours of training data, but without supervised fine-tuning (SFT).
Application scenarios of ChatTTS
ChatTTS is suitable for various scenarios that require high-quality speech synthesis, including but not limited to:
E-commerce live streaming: Provide more natural voice dubbing for live streaming to improve user experience.
We-media: Help we-media creators generate vivid dubbing to attract more audiences.
Online education: Provide clear and natural audio for online courses to improve learning outcomes.
Customer service and after-sales service: Provide more humane voice services to improve customer satisfaction.
Online use
Official website address: https://chattts.com/
Project address:https://www.1ai.net/11978.html
text: refers to the text content that needs to be converted into speech.
Refine text: Choose whether to automatically optimize the input text.
Randomness: A parameter that controls the randomness of the output. The larger the value, the more random the generated speech will be, which may result in the generated speech quality being sometimes better and sometimes worse.
Sound selection: The default value is 2222. This is a numeric parameter used to select the type of sound. The optional numbers are 2222, 7869, 6653, 4099, 5099. You can choose any one of them, or enter other numbers to randomly select a sound.
Customize sound: This is a positive integer parameter used to customize the pitch and timbre of the sound. If this value is set, it will take precedence and the sound selection parameter will be ignored.
Prompt settings: used to add laughter, pauses, etc. For example, it can be set to [oral_2][laugh_0][break_6].
Again, the advantage of this model is that it is open source, allowing it to be trained using personal voice data.
When using, please abide by laws, regulations and ethical standards.
In addition, someone has made a ChatTTS Web UI, which can be deployed by yourself: https://github.com/jianchang512/ChatTTS-ui