At the Future Artificial Intelligence Pioneer Forum of Zhongguancun Forum, Shengshu Technology andTsinghua UniversityJoin hands and officially launch ChinaThe firstWith long duration, high consistency and high dynamicsVideo mockup——"Vidu”.
The core of this leading video model lies in the U-ViT architecture that is a fusion of Diffusion and Transformer. It can not only generate a 16-second high-definition video with a resolution of 1080P in one click, but also show amazing imagination while simulating the real physical world. Multi-lens generation and high consistency of time and space are the unique charms of Vidu.
It is worth mentioning that Vidu has made significant breakthroughs worldwide since its release.top notchThe level is comparable and is still being iterated and optimized. This achievement is inseparable from the team's deep accumulation and many original achievements in the fields of Bayesian machine learning and multimodal large models.
In particular, the U-ViT architecture proposed by the team in September 2022 is the globalThe firstThe fusion architecture of Diffusion and Transformer laid a solid foundation for the birth of Vidu. Subsequently, in March 2023, the team took the lead again and open-sourced the multimodal diffusion model UniDiffuser based on the U-ViT fusion architecture, successfully verifying the large-scale scalability of the U-ViT architecture.
Based on the in-depth understanding of the U-ViT architecture and rich engineering and data experience, the team overcame many key technical challenges in long video representation and processing in a very short time, and developed the Vidu video model. This model performs well in improving video coherence and dynamics, further promoting the development of video processing technology.
The launch of Vidu not only once again verifies the excellent performance of the U-ViT fusion architecture in large-scale visual tasks, but also demonstrates Shengshu Technology's continuous innovation capabilities and industry-leading position in the field of multimodal native large models. As a universal visual model, Vidu can generate more diverse and longer video content, and its flexible architecture will also provide unlimited possibilities for future compatibility with a wider range of modalities and expanding the boundaries of multimodal general capabilities.
Application address:
https://shengshu.feishu.cn/share/base/form/shrcnybSDE4Id1JnA5EQ0scv1Ph