Fudan open source projectHallo, a project that generates talking videos based on audio and images, has adapted theComfyUIPlug-ins. Although the installation process requires more dependencies and has a relatively high threshold, the emergence of this open source ecosystem opens up more possibilities and fun for subsequent repainting and other processes.
The Hallo project allows facial photographs to start talking through audio input, accompanied by corresponding expressions, and the effect looks very natural. The project employs an end-to-end diffusion paradigm that introduces a layered audio-driven visual synthesis module to improve the accuracy of the alignment between the audio input and the visual output, including the movement of lips, expressions, and poses.
This layered audio-driven visual synthesis module provides adaptive control over the diversity of expressions and poses, enabling more effective personalization for different identities. This means that no matter who's face is pictured, a talking video can be generated through the Hallo project with a natural effect, as if a real person is talking.
Although the installation process of the Hallo project may be relatively complicated, its emergence has certainly brought new vitality to the open source ecosystem. As technology continues to evolve, we can expect more such projects to emerge in the future, bringing more convenience and fun to our lives.
Plugin address: https://github.com/AIFSH/ComfyUI-Hallo