October 14, 2012 - The Smart Spectrum technology team today announced thatOpen SourceWenshengtu Model CogView3 and CogView3-Plus-3B , the capabilities of the model series are now live"Zhipu Qingyan"App.
CogView3 is described as a text2img model based on cascade diffusion, which consists of the following three stages:
- Phase 1: Generation of 512x512 low resolution images using standard diffusion processes.
- Phase 2: Performs 2x super-resolution generation using a relay diffusion process to generate a 1024x1024 image from a 512x512 input.
- Phase 3: The generated results are again iterated based on relay diffusion to generate 2048×2048 high resolution images.
Officially, CogView3 outperforms the current state-of-the-art open-source text-to-image diffusion model SDXL by 77.01 TP3T in manual evaluation, while requiring only about 1/10 of the inference time of SDXL.
The CogView3-Plus model, on the other hand, introduces the latest DiT framework on top of CogView3 (ECCV'24) in order to realize a further improvement of the overall performance. It is reported that it uses Zero-SNR diffuse noise scheduling and introduces theJoint Text-Image Attention MechanismCogView-3Plus uses a VAE with a potential dimension of 16. It effectively reduces training and inference costs while maintaining the basic capabilities of the model compared to the commonly used MMDiT structure.CogView-3Plus uses a VAE with a potential dimension of 16.
The attached address is below:
Open source repository address:
-
https://github.com/THUDM/CogView3
Plus open source model repository:
-
https://huggingface.co/THUDM/CogView3-Plus-3B
-
https://modelscope.cn/models/ZhipuAI/CogView3-Plus-3B