With the rise of AI video technology launched by domestic companies, the short drama market has ushered in a new change. Qihuoshan Company was the first in China to reproduceSoraresults, and throughEtnaMajor breakthroughs have been made in innovative technologies such as models.
The Etna model uses Diffusion Transformer to process video data, achieving ultra-high definition of 15 seconds of 4K60 frames, while also having the ability to understand space and time and deep semantics.
Official website address:https://etna.7volcanoes.com/
Paper address: https://arxiv.org/pdf/2212.09748.pdf
As can be seen from the above figure, compared with existing models on the market, Etna maintains significant advantages in duration, high definition, rich and vivid details, and strong semantic understanding.
Why did Qihuoshan become the first company in China to reproduce Sora? The key innovation of Sora is a Diffusion Transformer that can flexibly process data of different dimensions. The spatiotemporal compressor will map the original video into the latent space, and the visual Transformer (ViT) model will process the latent representation that has been segmented and output the latent representation after removing the noise.
A system similar to the CLIP model guides the diffusion model to generate videos with a specific style or theme based on the user's instructions (which have been enhanced by a large language model) and latent visual cues. After multiple denoising processes, the latent representation of the generated video is obtained and then mapped back to the pixel space through the corresponding decoder.
Based on the technical accumulation in related fields, the Etna model quickly grasped the essence of Sora and introduced several innovations. The technical architecture innovation mainly addresses the challenges posed by the spatiotemporal characteristics of videos, such as how to compress videos into latent space in space and time to achieve efficient denoising, how to convert compressed latent space into patches and input them into Transformer, and how to handle long-distance spatiotemporal dependencies and ensure content consistency.
To this end, the Etna model uses the Diffusion architecture on the backbone network, and experiments and adapts the Diffusion+Transform architecture similar to Sora on a larger dataset. Combining the advantages of the Diffusion model and the Transformer model, Etna has formed an efficient and advanced new model architecture, which improves the generation efficiency of the model and ensures the high quality and consistency of the generated content.
Qihuoshan Company has not only developed AI multimodal products, but also launched strategic cooperation with partners such as Xiaomi and Kuaishou to jointly explore the overseas market for short dramas.
The capital market is full of expectations for the development of Qihuoshan Company, believing that it has potential growth space and investment value. The rise of AI video technology will subvert the entire short video industry chain, bringing users a new viewing experience, and also bringing more business opportunities and development space to upstream and downstream companies in the industry chain.