Stable Audio 2.0

audio

by Stability AI · Updated February 15, 2026

Generate with Stable Audio 2.0

Stable Audio 2.0 is Stability AI's latent diffusion model for music and sound effect generation. It produces high-quality, 44.1kHz stereo audio up to 3 minutes in length from text prompts. Built on a diffusion transformer architecture, it excels at generating coherent musical structures with clear instruments and professional production quality. Available as an open-source model for local use.

Best For

Music productionSound effectsBackground musicOpen-source audioLocal generation

Prompting Tips

  1. 1Be specific about genre, tempo, and instrumentation
  2. 2Describe the production style: "lo-fi", "studio-quality", "live recording"
  3. 3Use music production terms for precise results: "reverb", "stereo width", "warm bass"
  4. 4Specify the mood progression for longer tracks
  5. 5Include BPM and key signature for music theory-aware prompts

Syntax & Constraints

Natural language prompts. Generates up to 3 minutes of 44.1kHz stereo audio. Open-source via Stability AI. Uses latent diffusion architecture.

Build Prompts for Stable Audio 2.0

Other Stable Diffusion Models

Related Guides