Stable Audio 2.0
audioby Stability AI · Updated February 15, 2026
Stable Audio 2.0 is Stability AI's latent diffusion model for music and sound effect generation. It produces high-quality, 44.1kHz stereo audio up to 3 minutes in length from text prompts. Built on a diffusion transformer architecture, it excels at generating coherent musical structures with clear instruments and professional production quality. Available as an open-source model for local use.
Best For
Prompting Tips
- 1Be specific about genre, tempo, and instrumentation
- 2Describe the production style: "lo-fi", "studio-quality", "live recording"
- 3Use music production terms for precise results: "reverb", "stereo width", "warm bass"
- 4Specify the mood progression for longer tracks
- 5Include BPM and key signature for music theory-aware prompts
Syntax & Constraints
Natural language prompts. Generates up to 3 minutes of 44.1kHz stereo audio. Open-source via Stability AI. Uses latent diffusion architecture.
Build Prompts for Stable Audio 2.0
Other Stable Diffusion Models
The most popular open-source image generation model.
Next-gen architecture with improved text and composition.
Efficient multi-stage generation architecture.
Latest Stable Diffusion with 8B parameters for top quality.
Real-time SDXL generation in a single step.
Open-source image-to-video generation model.