Google launches multimodal VLOGGER AI: Let static portraits move and "speak"

GoogleRecently, a blog post was published on the GitHub page, introducing VLOGGER AI Models, users only need to input a portrait photo and an audio content,The model can make these characters "animate" and read the audio content with rich facial expressions.

Google launches multimodal VLOGGER AI: making static portraits move and "talk"

VLOGGER AI is a virtual portraitMultimodality The Diffusion model is trained using the MENTOR database, which contains portraits of more than 800,000 people and more than 2,200 hours of video, allowing VLOGGER to generate portrait videos of different races, ages, clothing, and poses.

The researchers said: "Compared to previous multimodal methods, VLOGGER has the advantages of not requiring training for each individual, not relying on face detection and cropping, generating complete images (not just faces or lips), and considering a wide range of scenarios (such as visible torsos or different subject identities), which are critical for correctly synthesizing communicating humans."

Google sees VLOGGER as a step towards a "universal chatbot," after which AI can interact with humans in a natural way through voice, gestures, and eye contact.

VLOGGER's application scenarios also include reports, educational fields, and narration. It can also be used to edit existing videos, and if you are not satisfied with the expressions in the video, you can make adjustments.

Attach the paper reference

VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.

Google launches multimodal VLOGGER AI: making static portraits move and "talk"

Developers share short videos generated by OpenAI Sora: leaf elephants, rainbow waterfalls, etc.

Dubai AI dubbing company Camb.AI raises $4 million in seed funding to provide high-fidelity instant dubbing services

AI Weibo

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

Related content:

Developers share short videos generated by OpenAI Sora: leaf elephants, rainbow waterfalls, etc.

Dubai AI dubbing company Camb.AI raises $4 million in seed funding to provide high-fidelity instant dubbing services

Google DeepMind launches Genie model: 11 billion parameters, generates 2D games based on pictures and prompts

Google's Eureka AI model is exposed in advance, and its excellent text writing ability attracts attention

Google releases NeuralGCM weather forecast AI model: lower operating costs and more accurate forecasts

Reddit was reported to have blocked multiple search engines and AI crawlers, and the official said it had nothing to do with the cooperation with Google

Please enter the code

... .Payment confirmation in progress....

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow