New AI framework HyperHuman: for generating hyper-realistic humans with latent structural diffusion

Recently, a study calledHyperHumanA new AI framework from the University of California, Berkeley, has been unveiled, ushering in a new era for generating ultra-realistic human images. The key breakthrough of this framework is that it combines structured diffusion technology to successfully overcome the challenges faced by previous models in generating human images.

Users do not need professional skills, they only need to provide conditions such as text and pose, and HyperHuman can generate highly realistic human images from them. This has far-reaching significance for a variety of applications such as image animation and virtual try-on. Previous methods either rely on variational autoencoders (VAEs) in a reconstruction manner or improve realism through generative adversarial networks (GANs). However, these methods are often only applicable to small-scale datasets due to unstable training and limited model capacity, resulting in a lack of diversity in the generated images.

The HyperHuman framework introduces structural diffusion models (DMs), which have become the dominant architecture in generative AI. Although previous text-to-image models (T2I) still face challenges when using structural diffusion, HyperHuman successfully solves the problem of non-rigid deformation of human form through the combination of Latent Structural Diffusion Model and Structure-Guided Refiner. These two modules work together to collaboratively model the appearance, spatial relationships, and geometry of the image in a unified network.

The key to HyperHuman is to recognize that human images have structural properties at multiple levels, from coarse-grained body skeletons to fine-grained spatial geometry. To achieve this, the researchers built a large-scale human-centric dataset called HumanVerse, which contains 340 million wild human images with detailed annotations. Based on this dataset, HyperHuman designed two key modules, namely Latent Structural Diffusion Model and Structure-Guided Refiner. The former ensures the spatial alignment of texture and structure by enhancing the pre-trained diffusion backbone and denoising RGB, depth, and normals. The latter provides prediction conditions for detailed and high-resolution image generation through spatially aligned structural maps.

New AI framework HyperHuman: for generating hyper-realistic humans with latent structural diffusion

In addition, HyperHuman also adopts a powerful modulation scheme to mitigate the impact of error accumulation in the two-stage generation process. Through a carefully designed noise plan, low-frequency information leakage is eliminated, ensuring the uniformity of local area depth and surface normal values. Each branch uses the same time step to enhance learning, which promotes feature fusion. This whole set of designs ensures that the model treats structural and texture richness in a unified manner.

Comparisons with the current state of the art show that HyperHuman exhibits superior quality in the generated images.FirstThe input skeleton, jointly denoised normals, depth, and coarse RGB (512×512) computed by HyperHuman are shown in a 4×4 grid.

The emergence of HyperHuman provides a new method for generating ultra-realistic human images, breaking through the limitations of previous models and bringing broader possibilities for future applications such as virtual try-on and image animation.

Project URL: https://snap-research.github.io/HyperHuman/

Paper URL: https://arxiv.org/abs/2310.08579

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.

New AI framework HyperHuman: for generating hyper-realistic humans with latent structural diffusion

Microsoft Azure AI adds Phi, Jais, and 40 new large models

Google Search cannot distinguish between AI-generated images and real images, raising concerns about the display of real content

AI Weibo

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

Related content:

Microsoft Azure AI adds Phi, Jais, and 40 new large models

Google Search cannot distinguish between AI-generated images and real images, raising concerns about the display of real content

British driverless car company Wayve raises $1 billion in funding led by SoftBank

Zero One Everything launches a one-stop AI work platform "Wanzhi", which supports Q&A, document interpretation, and PPT making

French AI startup Mistral AI is close to reaching new financing agreement with a valuation of $6 billion

Krea AI officially releases video generation function to customize the first and last frames of the video

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow