Stability AI recently launched Stable Video Diffusion, called theVideo Generation Model, which is based on the company's existing Stable Diffusion text-to-image model, is able to generate video by animating existing images. Unlike other AI companies, Stable Video Diffusion offers one of the few video generation models in the open source space.
It should be noted, however, that the model is currently in the "research preview" phase, and users must agree to specific terms of use that specify its intended applications, such as "educational or creative tools", while prohibiting its use for "representations of real events or people". Given the past history of similar AI research previews, it is possible that the model will soon be circulating on the dark web, raising concerns about its misuse, especially since it does not appear to have a built-in content filter.
Stable Video Diffusion offers two models, SVD and SVD-XT, where SVD converts still images to 576x1024 video at 14 fps, while SVD-XT boosts the frame rate to 24 fps in the same architecture. both can generate video at 3 to 30 frames per second. Both models were initially trained on a dataset of millions of videos, and then "fine-tuned" on smaller datasets in the hundreds of thousands to millions range, according to the white paper.
The four-second video clips generated by the model are of fairly high quality and are considered to be comparable in some respects to video generation models from Meta, Google, and other AI startups. However, Stable Video Diffusion has some limitations, such as not being able to generate video without motion or slow camera movement, not being able to be controlled by text, not being able to render text (at least not in a clear and legible way), and not being able to consistently generate faces and characters.
Despite these limitations, Stability AI notes that the models are quite scalable and can be adapted to generate use cases such as 360-degree views of objects. The company plans to release a "range" of models that build on and extend the capabilities of SVD and SVD-XT, as well as a "text-to-video" tool that introduces text cues into the web model. The ultimate goal is commercialization, seeing Stable Video Diffusion as having "potential applications in advertising, education, entertainment, etc.".
However, Stability AI is currently facing financial problems. The company reportedly recently raised $25 million through convertible debt, bringing its total funding to $125 million. However, the company did not close a new round of funding at a higher valuation, last valued at $1 billion.Stability AI had planned to seek a valuation of four times that amount in the coming months, despite the company's lower revenues and higher burn rate.
Stability AI also faced one executive departure during this period. Ed Newton-Rex, the company's vice president, left the company over a debate over how to use copyrighted data, he said in an open letter. This was another setback for the company, as Newton-Rex had played a key role in the launch of Stability AI's music generation tool, Stable Audio.
Official demo video: https://www.youtube.com/watch?v=G7mihAy691g