Foundry Is Now a Music and Speech Studio

Features Speech AI Music

When Foundry launched, it was a music generation studio. You typed a description, it produced a full song with vocals and instruments, and then you could edit, layer, and mix the result. That part is still there and stronger than ever.

What changed: Foundry now generates speech too.

What speech generation looks like in practice

You write a script. You assign voices to roles. You set the emotional tone per line or per paragraph. Foundry produces spoken audio that sounds like a real person read it, with the feeling you asked for.

40 emotions are available, each with 5 intensity levels. Whisper, joy, fury, heartbreak, calm authority, nervous energy. You pick what fits and dial how much. The voice stays consistent across the entire read. It does not drift into a different character halfway through.

60 built-in speaker presets cover a wide range of vocal types. If none of them match what you need, voice cloning lets you create a custom voice from a short audio sample. That cloned voice then works with the same emotion controls as everything else.

Multi-speaker scenes

Audiobooks have multiple characters. Podcasts have co-hosts. Dialogues need distinct voices that interact naturally. Foundry handles all of this. You assign different voices to different roles, and the output keeps each one separate and recognizable.

A full audiobook chapter with three characters, each with their own emotional arc, generates at up to 15x real-time speed. On a 12 GB NVIDIA card, that means a 10-minute scene is ready in under a minute.

Music and speech on the same timeline

This is where the combination gets interesting. You generate a narration track. You generate a background score. You drag both onto the timeline. You trim, fade, adjust levels, and export a finished production.

Podcast intros with custom music beds. Game trailers with character dialogue over an original score. Audio dramas with layered sound design. All built inside one application, all running on your hardware.

Everything else is still here

Text-to-music generation with the caption builder. Patch editing to fix sections without starting over. Stem separation into 7 tracks. Cover and restyle workflows. 32 DSP effects with 200+ presets. The Creative AI for writing assistance. The full timeline editor with multi-track mixing.

Speech did not replace any of that. It just made Foundry into something broader: a complete local audio production environment for both music and voice.