recent,Stability AI The team launched a newOpen Source Audio Generation Model, named Stable Audio OpenWhat’s special about this model is that it can generate up to 47 seconds of stereo audio from text prompts, with a sampling rate of up to 44.1kHz.
With many currently popularAudio Generation ModelUnlike the previous model, the weights of Stable Audio Open are open, which means that anyone can view, modify and extend the model. This design concept not only promotes the progress of scientific research, but also provides more possibilities for developers. More importantly, this model is trained only with audio files licensed under Creative Commons, which not only ensures the legality of the data, but also avoids potential copyright issues, reflecting the high attention paid to the ethical use of data.
In terms of technical architecture, Stable Audio Open uses an advanced architecture to ensure high fidelity of text-to-audio generation. It can generate high-quality stereo audio, which allows users to enjoy a clear and realistic sound experience. During the training process, the model is exposed to a variety of audio samples, which also helps it learn a richer soundscape, making the generated audio more realistic and diverse.
In addition, to ensure that the performance of the new model is comparable to the industry's top models, the development team conducted a comprehensive performance evaluation. Through the key evaluation indicator FDopenl3, the researchers found that the model performed well in generating high-quality audio, comparable to other excellent models in the industry. This comparative study further proves the superiority and practicality of Stable Audio Open.
The launch of Stable Audio Open not only focuses on openness and high-quality audio synthesis, but also provides an important tool for researchers, artists and developers.