Stable Audio 2.0

audio

by Stability AI · Updated February 15, 2026

Stable Audio 2.0 is Stability AI's latent diffusion model for music and sound effect generation. It produces high-quality, 44.1kHz stereo audio up to 3 minutes in length from text prompts. Built on a diffusion transformer architecture, it excels at generating coherent musical structures with clear instruments and professional production quality. Available as an open-source model for local use.

Best For

Music productionSound effectsBackground musicOpen-source audioLocal generation

Prompting Tips

1Be specific about genre, tempo, and instrumentation
2Describe the production style: "lo-fi", "studio-quality", "live recording"
3Use music production terms for precise results: "reverb", "stereo width", "warm bass"
4Specify the mood progression for longer tracks
5Include BPM and key signature for music theory-aware prompts