Stability AI based on Stable Diffusion Wenshengtu Model, further expanded into the audio field and launched Stable Audio Open, can generate high-quality audio samples based on the prompt words entered by the user.
Stable Audio Open can create music up to 47 seconds long, which is very suitable for drum beats, instrumental melodies, ambient sounds, and onomatopoeia. This open source model is based on the diffusion model of transforms (DiT) and operates in the latent space of the autoencoder to improve the quality and diversity of generated audio.
Stable Audio Open is now open source, and IT Home has attached a relevant link. Interested users can try it on HuggingFace. It is said that it uses more than 486,000 samples from music libraries such as FreeSound and Free Music Archive for training.
Stability AI says: "While it can generate short musical snippets, it is not suitable for full songs, melodies, or vocals."
The difference between Stable Audio Open and Stable Audio 2.0 is that the former is an open source model that focuses on short audio clips and sound effects, while the latter can generate complete audio up to 3 minutes.