Stable Audio Open

Stable Audio Dev Team thrilled to introduce Stable Audio Open, an open source model designed for generating up to 47 seconds of audio samples and sound effects from text prompts. This model enables users to create drum beats, instrument riffs, ambient sounds, foley recordings, and various production elements. With the ability to produce audio variations and style transfers, Stable Audio Open is set to empower sound designers, musicians, and creative communities.

What is Stable Audio Open?

Stable Audio Open allows anyone to generate up to 47 seconds of high-quality audio data from a simple text prompt. It is specially trained for creating drum beats, instrument riffs, ambient sounds, foley recordings, and other audio samples ideal for music production and sound design.

A significant advantage of this open source release is the ability for users to fine-tune the model with their own custom audio data. For instance, a drummer can fine-tune the model using their drum recordings to generate new beats.

How is it Different from Stable Audio?

Th commercial product, Stable Audio, produces high-quality, full tracks with coherent musical structure up to three minutes in length and offers advanced capabilities like audio-to-audio generation and multi-part musical compositions. In contrast, Stable Audio Open specializes in generating short audio samples, sound effects, and production elements.

While it can create brief musical clips, it is not optimized for full songs, melodies, or vocals. This open model provides insight into generative AI for sound design, emphasizing responsible development alongside creative communities.

The new model was trained on audio data from FreeSound and the Free Music Archive, allowing us to create an open audio model while respecting creators’ rights.

Getting Started

The Stable Audio Open model weights are available on Hugging Face. We invite sound designers, musicians, developers, and audio enthusiasts to download the model, explore its capabilities, and provide feedback.

This release is an exciting step forward, yet it marks only the beginning of our journey towards open and responsible audio generation capabilities. We are committed to ongoing research and development in collaboration with creative communities. Let the open exploration of AI audio begin!

Limitations

Vocal Generation: The model is not able to generate realistic vocals.
Language Support: The model has been trained with English descriptions and will not perform as well in other languages.
Music Styles and Cultures: The model does not perform equally well for all music styles and cultures.
Sound Effects vs. Music: The model is better at generating sound effects and field recordings than music.
Prompt Engineering: It is sometimes difficult to assess what types of text descriptions provide the best generations. Prompt engineering may be required to obtain satisfying results.

We encourage users to explore these limitations and share their feedback to help us improve the model.

Read related articles: