In recent days,ChatTTSIt's quite popular, claiming to be a text-to-speech model specially designed for conversation scenarios. I downloaded it to play with it, and found that the effect of the open source version is still far from that of the promotional video. It is said that the restrictions are intentional.

ChatTTS is a powerful text-to-speech system. However, it is important to use this technology responsibly and ethically. To limit the use of ChatTTS, we added a small amount of additional high-frequency noise during the 4w hour model training process and used mp3 format to degrade the sound quality as much as possible to prevent criminals from using it for potential crimes. At the same time, we trained the detection model internally and plan to open it in the future.

At least it can be used. Let's build a web interface and a lazy package first to make it easier to use. This article mainly includes three parts

1. Deploy ChatTTS from source code

2. Build the web interface

3. Translate in videoDubbing ToolsUse in

4. Open source address: https://github.com/jianchang512/chatTTS-ui

Source code deployment ChatTTS

Assume that the code is to be stored in E:/python/chat, make sure the chat directory is empty, enter, enter cmd in the address bar, and then execute the command git clone https://github.com/2noise/ChatTTS . (The git client can be installed here https://github.com/git-for-windows/git/releases/download/v2.45.1.windows.1/Git-2.45.1-64-bit.exe )
pip install -r requirements.txt
For ease of use, install two additional modules pip install modelscope soundfile
Download the model. By default, it is downloaded from huggingface.co. For well-known reasons, it cannot be downloaded without scientific Internet access. Use modescope instead.

Key Code

from modelscope import snapshot_download # Download to the models folder in the current directory and return to the local model directory CHATTTS_DIR = snapshot_download('pzc163/chatTTS',cache_dir="./models")

Then when loading_models, set the local source and source path

chat = ChatTTS.Chat() chat.load_models(source="local",local_path=CHATTTS_DIR)

Test it out


import ChatTTS from modelscope import snapshot_download CHATTTS_DIR = snapshot_download('pzc163/chatTTS',cache_dir="./models") chat = ChatTTS.Chat() chat.load_models(source="local",local_path=CHATTTS_DIR) wavs = chat.infer(["Do you know I'm waiting for you? Do you really care about me?"], use_decoder=True)

wavs[0] is valid audio data. There is a pitfall here. The official IPython Audio example may not be able to play it. Therefore, use soundfile to save it locally and then play it.

sf.write('1.wav', wavs[0][0], 24000)

If nothing unexpected happens, you should be able to hear a relatively realistic human voice.

Build a web interface

Flask is the first choice for simple pages, and waitress is used for wsgi.

First install pip install flask waitress
Set static directory and template directory

app = Flask(__name__, static_folder='./static', static_url_path='/static', template_folder='./templates') @app.route('/static/ ') def static_files(filename): return send_from_directory(app.config['STATIC_FOLDER'], filename) @app.route('/') def index(): return render_template("index.html")

Create an API interface to synthesize the received text into speech

# params # text: text to be synthesized # voice: timbre # prompt: @app.route('/tts', methods=['GET', 'POST']) def tts(): # original string text = request. args.get("text","").strip() or request.form.get("text","").strip() prompt = request.form.get("prompt",'') try: voice = int(request.form.get("voice",'2222')) except Exception: voice=2222 speed = 1.0 try: speed = float(request.form.get("speed",1)) except: pass if not text: return jsonify({"code": 1, "msg": "text params lost"}) texts = [text] std, mean = torch.load(f'{CHATTTS_DIR}/asset/spk_stat.pt').chunk(2) torch.manual_seed(voice) rand_spk = torch.randn(768) * std + mean wavs = chat.infer(texts, use_decoder= True,params_infer_code={'spk_emb': rand_spk} ,params_refine_text= {'prompt': prompt}) md5_hash = hashlib.md5() md5_hash.update(f"{text}-{voice}-{language}-{speed} -{prompt}".encode('utf-8')) datename=datetime.datetime.now().strftime('%Y%m%d-%H_%M_%S') filename = datename+'-'+md5_hash.hexdigest() + ".wav" sf.write(WAVS_DIR+'/'+filename, wavs[0][0], 24000) return jsonify({"code": 0, "msg": "ok","filename":WAVS_DIR+'/'+filename,"url":f"http://{WEB_ADDRESS}/static/wavs/{filename}"})

It should be noted that the sound acquisition

    std, mean = torch.load(f'{CHATTTS_DIR}/asset/spk_stat.pt').chunk(2) torch.manual_seed(voice) rand_spk = torch.randn(768) * std + mean

Randomly select a timbre. Currently ChatTTS does not provide a friendly timbre selection interface.

Start flask

    from flask import Flask, request, render_template, jsonify, send_file, send_from_directoryfrom waitress import serve try: serve(app,host='127.0.0.1', port=9966) except Exception: pass

The front-end interface is implemented using bootstrap5, which is very simple and the code is omitted.

Test it with python code

    import requests res=requests.post('http://127.0.0.1:9966/tts',data={"text":"Do you know I'm waiting for you? Do you really care about me?","prompt":"","voice":"2222"}) print(res.json()) #ok {code:0,msg:'ok',filename:filename.wav,url:http://127.0.0.1:9966/static/wavs/filename.wav} #error {code:1,msg:"error"}

Used in video translation and dubbing

1. Use the Windows pre-packaged version or source code to deploy the ChatTTS UI project and start it. The open source address of the project is https://github.com/jianchang512/chatTTS-ui

2. Upgrade the video translation and dubbing software to version 1.82+, download address: https://pyvideotrans.com/downpackage.html

3. In the video translation and dubbing software, go to Menu-Settings-ChatTTS address bar and enter the http address. The default is http://127.0.0.1:9966

Free open source video translation and dubbing tool, ChatTTS builds web interface and API interface

4. You can use it happily

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.

{{userData.name}}Verify

Free open source video translation and dubbing tool, ChatTTS builds web interface and API interface

Source code deployment ChatTTS

Build a web interface

Used in video translation and dubbing

AI face-changing tool experience, Face Swap is a simple and easy-to-use video one-click face-changing AI tool

AI digital population broadcast video generation, use Jianying to generate AI digital population broadcast video in 5 minutes

AI Weibo

AI Applications

5000+ AI applications! Updated daily

AIAICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

{{userData.name}}Verify

Source code deployment ChatTTS

Build a web interface

Used in video translation and dubbing

Related content:

AI face-changing tool experience, Face Swap is a simple and easy-to-use video one-click face-changing AI tool

AI digital population broadcast video generation, use Jianying to generate AI digital population broadcast video in 5 minutes

ChatTTS in-depth experience, the most powerful open source text-to-speech (TTS) tool

Learn the basics of Midjourney: 10 examples of using Midjourney to add professional effects to pictures (with tips)

Use coze to build a "full-featured" WeChat customer service, a corporate WeChat that is all AI

What is GPTs? ChatGPT 4.0 How to create a GPTs application?

AI Applications

5000+ AI applications! Updated daily

AIAICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow