MuseTalk It is a real-time high-quality audio-driven lip-sync model developed by Tencent Music Tianqin Lab, which is specifically used for virtual mouth shape generation. The model can automatically adjust the facial image of the digital character according to the input audio signal, so that its lip shape is highly synchronized with the audio content, thereby achieving the effect of matching the lip shape with the sound. MuseTalk performs well in lip shape generation, and can generate accurate lip shapes with good picture consistency, especially for real-person video generation.
The main features of MuseTalk include:
- Real-time performance: Real-time inference at more than 30 frames per second can be achieved on NVIDIA Tesla V100.
- Multi-language support: supports audio input in multiple languages such as Chinese, English and Japanese, which enables it to provide services to users in different countries and regions.
- High-precision lip sync: Through Latent Space Inpainting technology, high-precision lip modification can be performed on a 256 x 256 pixel facial area.
- High picture consistency: The generated lip shape matches the sound accurately and the picture consistency is good.
- Wide range of application scenarios: Suitable for a variety of video content processing needs, such as self-media production, virtual anchors, etc.
However, the deployment process of MuseTalk is rather cumbersome and difficult for novice users, and it has high requirements for computer graphics cards and memory. Fortunately, Google launched Google Colab, with which we can quickly, free and easily deploy MuseTalk. Google Colab (also known as Colaboratory) is a free cloud development environment provided by Google, mainly used for tasks such as data analysis, machine learning and deep learning. It is based on Jupyter Notebook, and users can directly write and execute Python code through the browser, and can share and collaborate on editing code with others.
First, open this address:
https://colab.research.google.com/github/camenduru/MuseTalk-jupyter/blob/main/MuseTalk_jupyter.ipynb
Click the upper right corner, change the runtime type, and select T4GPU
You can see that Google Colab has allocated us free 12G memory, 78G hard disk, and GPU computing resources;
Click the small triangle to run the code:
After about 3 minutes, the operation is successful.
When you see the line Running on public URL, it means that MuseTalk has been successfully deployed, then click this URL:
Upload an audio and a reference video:
It takes more than 10 seconds to process the video after it is uploaded
Then click: Generate
If: Error appears, Connection errored out.
You can shorten the video and audio duration to about 20 seconds, and then run it again;
The last step takes more time, usually more than 20 minutes;
When the video appears on the right, the processing is complete:
Then click download in the upper right corner to download the processed video.