Shengshu Technology attended the Future Artificial Intelligence Pioneer Forum of Zhongguancun Forum today and officially released China's first long-duration, high-consistency, and high-dynamicVideo mockup ——Vidu, which is considered by the media and industry insiders to be the first Sora-level video model in China.
According to the official description, the Vidu model combines Diffusion and Transformer to create U-ViT.Supports one-click generation of high-definition video content up to 16 seconds in length and with a resolution of up to 1080P.
The official promotional materials demonstrated that "a ship in a studio is sailing towards the camera", and the effects of the waves and the ship are very realistic.
Officials said that Vidu can not only simulate the real physical world, but also has rich imagination, and has the characteristics of multi-lens generation and high time-space consistency.
Vidu is the world's first large-scale video model to achieve major breakthroughs since the release of Sora. Its performance is fully comparable to the international top level and is being improved through accelerated iteration.
Vidu's rapid breakthroughs stem from the team's long-term accumulation and many original achievements in Bayesian machine learning and multimodal large models.
Its core technology, U-ViT architecture, was proposed by the team in September 2022. It was earlier than the DiT architecture adopted by Sora and is the world's first architecture that integrates Diffusion and Transformer.
In March 2023, the team open-sourced UniDiffuser, the world's first multimodal diffusion model based on the U-ViT fusion architecture, and took the lead in completing the large-scale scalability verification of the U-ViT architecture.