Let Mona Lisa sing rap, Microsoft released VASA-1 model: pictures + audio can generate short videos

MicrosoftMicrosoft Research Asia recently published a paper on the new VASA-1 Model,All the user needs to do is provide a static portrait image and a voice audio clip, and the model will automatically make the person in the image speak automatically.

What's particularly interesting about VASA-1 is its ability to simulate natural facial expressions, various emotions and lip synchronization, and most importantly, there are virtually no artificial traces, which are hard to detect if you don't look closely.

The researchers admit that, like all other models, the model is currently unable to properly handle non-rigid elements such as hair, but the overall results are superior to other similar models.

Let Mona Lisa sing rap, Microsoft released VASA-1 model: pictures + audio can generate short videos

The researchers also say that VASA-1 supports the generation of short, dynamic videos with a resolution of 512*512 at 45fps in offline batch processing mode, and 40 fps in online live streaming mode, with a latency of only 170ms, and that the entire generation operation can be processed on just one computer equipped with NVIDIA's RTX 4090 graphics card.

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.
Information

School AI: Creating Your Own Chatbot for Every Student

2024-4-19 13:12:54

Information

Stability AI announces layoffs of 20 employees, accounting for approximately 101% of total employees

2024-4-19 15:12:15

Search