Sora replacement? 2-minute long AI video model StreamingT2V free open source trial address announced

Recently, Picsart AI Research and other teams jointly released a product calledStreamingT2VofAI Video Model, the model is able to generate videos up to 1200 frames and 2 minutes long, which technically surpasses the previously highly regarded Sora model. The release of StreamingT2V not only achieved a breakthrough in video length, but it is also aFree and open sourceThe project can be seamlessly compatible with models such as SVD and animatediff, which is of great significance to the development of the open source ecosystem.

Sora replacement? 2-minute long AI video model StreamingT2V free open source trial address announced

Before Sora, the video generation models on the market, such as Pika, Runway, Stable Video Diffusion (SVD), etc., usually could only generate videos of a few seconds to more than ten seconds. The emergence of Sora has become a new benchmark in the industry with its 60-second video generation capability. Now, the launch of StreamingT2V has not only made a breakthrough in duration, but can also theoretically be infinitely long, which brings more possibilities to the field of video generation.

StreamingT2V's architecture uses advanced autoregressive techniques to create long videos with rich motion dynamics while maintaining temporal consistency and high frame-level image quality. Compared with existing text-to-video diffusion models, these models usually focus on high-quality short video generation, but often suffer from quality degradation, stiff performance, or stagnation when extended to long videos. StreamingT2V effectively solves these problems by introducing conditional attention modules (CAM) and appearance preservation modules (APM), as well as a random mixing method.

CAM, as a short-term memory block, adjusts the current generation of video through the attention mechanism to achieve consistent block transition; while APM, as a long-term memory block,FirstExtract from video chunksadvancedScene and object features to prevent the model from forgetting the initial scene. In addition, StreamingT2V also uses a high-resolution text-to-video model to perform auto-regressive enhancement on the generated videos to improve quality and resolution.

Currently, StreamingT2V has been open sourced on GitHub and is available for free trial on huggingface. Although the server load may be high, users can try to generate videos by inputting text and image prompts. In addition, huggingface also shows some successful cases, which proves the powerful ability of StreamingT2V in video generation.

The release of StreamingT2V not only brings new technological breakthroughs to the field of video generation, but also provides a powerful tool for the open source community, which helps promote the development and application of related technologies. In the future, we may expect more innovative applications based on such technologies, such as playing an important role in film production, game development, virtual world construction and other fields.

Paper address: https://arxiv.org/pdf/2403.14773.pdf

Trial address 1: https://huggingface.co/spaces/PAIR/StreamingT2V

Trial address 2: https://replicate.com/camenduru/streaming-t2v

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.
Information

Microsoft and OpenAI plan to build a data center with millions of servers

2024-4-16 9:42:30

Information

The open source MiniCPM 2.0 series of models from Mianbi Intelligent has significantly enhanced its OCR and other capabilities

2024-4-16 9:46:50

Search