Foundry Is Now a Music and Speech Studio

Features Speech AI Music

When Foundry launched, it was a music generation studio. You typed a description, it produced a full song with vocals and instruments, and then you could edit, layer, and mix the result. That part is still there and stronger than ever.

What changed: Foundry now generates speech too.

What speech generation looks like in practice

You write a script. You assign voices to roles. You set the emotional tone per line or per paragraph. Foundry produces spoken audio that sounds like a real person read it, with the feeling you asked for.

40 emotions are available, each with 5 intensity levels. Whisper, joy, fury, heartbreak, calm authority, nervous energy. You pick what fits and dial how much. The voice stays consistent across the entire read. It does not drift into a different character halfway through.

60 built-in speaker presets cover a wide range of vocal types. If none of them match what you need, voice cloning lets you create a custom voice from a short audio sample. That cloned voice then works with the same emotion controls as everything else.

Multi-speaker scenes

Audiobooks have multiple characters. Podcasts have co-hosts. Dialogues need distinct voices that interact naturally. Foundry handles all of this. You assign different voices to different roles, and the output keeps each one separate and recognizable.

A full audiobook chapter with three characters, each with their own emotional arc, generates at up to 15x real-time speed. On a 12 GB NVIDIA card, that means a 10-minute scene is ready in under a minute.

Music and speech on the same timeline

This is where the combination gets interesting. You generate a narration track. You generate a background score. You drag both onto the timeline. You trim, fade, adjust levels, and export a finished production.

Podcast intros with custom music beds. Game trailers with character dialogue over an original score. Audio dramas with layered sound design. All built inside one application, all running on your hardware.

Everything else is still here

Text-to-music generation with the caption builder. Patch editing to fix sections without starting over. Stem separation into 7 tracks. Cover and restyle workflows. 32 DSP effects with 200+ presets. The Creative AI for writing assistance. The full timeline editor with multi-track mixing.

Speech did not replace any of that. It just made Foundry into something broader: a complete local audio production environment for both music and voice.

More from Echoes

You Run LLMs Locally. You Generate Images Locally. Why Is Your Audio Still in the Cloud?

You went local for text and images. But every time you need a voiceover, a soundtrack, or a sound effect, you are back in a browser uploading files to someone else's GPU. Here is why local AI audio deserves a spot in your stack.

The Best ElevenLabs Alternatives in 2026 (Especially If You're Tired of the Bill)

Looking for ElevenLabs alternatives in 2026? We compare the top AI voice generators by price, privacy, and features, including one that runs entirely on your own computer.

How to Pick a TTS Tool for Production Use (Not Just Demos)

Every TTS tool sounds good on a demo. This is the version for people who actually need to ship something — covering consistency, per-character pricing at scale, API reliability, and when cloud vs. local is the right answer.

Best AI Voice Cloning Tools in 2026: The Complete Guide (Cloud vs. Local)

ElevenLabs, Resemble AI, Descript, Fish Audio, Play.ht — and one that keeps your voice on your own machine. An honest comparison of every major AI voice cloning tool in 2026, with real pricing, what happens to your voice data, and who each tool actually serves.

Best AI Music Generators in 2026: Cloud vs. Local Compared

Suno, Udio, AIVA, Boomy — and one that runs entirely on your machine. A complete comparison of every major AI music generator in 2026, with real pricing, limitations, and who each tool is actually for.

What "Digitally Signed" and "Windows Defender Verified" Actually Mean

A plain-language explanation of digital signatures, code signing certificates, and Windows SmartScreen reputation - and why new software shows a warning even when it is perfectly safe.

Voice Cloning and the Emotion Engine

How voice cloning and emotional direction work in Foundry. 40 emotions, 5 intensity levels, 60 speaker presets, and cloned voices that stay in character.

Inside Foundry: How the AI Systems Work Together

Foundry is not a single model. It combines music generation, Creative AI, speech and voice tools, stem separation, DSP, and VRAM-aware local orchestration into one production system.

The Local Production Workflow: Music and Voice in One Place

Generate music and speech on your GPU. Layer them on a timeline. Apply 32 DSP effects. Export finished audio. Here is the full local production workflow.

Creative AI and the 120-Command Automation Engine

The Creative AI writes captions and lyrics from a single idea. The automation engine offers 120+ commands for batch workflows, CLI scripting, and agentic control.