Demodokos Foundry V4: Biggest Update Since Launch

Product Update AI Voice Local AI Hardware TTS

The Demodokos Foundry V4 update is the largest release since launch. It rebuilds the speech engine, widens GPU support, refreshes the entire interface, and makes setup far more transparent. The short version: better voices, on more machines, in a studio that finally looks and feels like one.

Key takeaways

  • The new V4 Speech model keeps a voice consistent all the way through a clip, across every style and intensity, so it stops drifting or breaking character mid-line.
  • Foundry runs on more hardware now: AMD and Vulkan support got a real upgrade, and a new Medium model runs on as little as 4GB of VRAM.
  • The mixer and track visuals were rebuilt from the ground up, so you can finally see what your audio is doing, at much higher performance.
  • Setup is more reliable: the Package Manager now shows downloads, retries, and model inventory clearly instead of stalling silently.
  • Everything still runs 100% locally on your own GPU. No cloud, no uploads, no credits. Still $12/month.

What is new in the Demodokos Foundry V4 update?

Foundry has always been a local AI audio suite for music, speech, voice cloning, and timeline editing that runs on your own Windows PC. V4 does not change that foundation. It sharpens almost every part of it. Here is what actually changed, and what each change means for the work you are trying to ship.

Area What changed What it means for you
Speech engineNew V4 Speech model familyVoices hold character across styles and intensities
HardwareMedium model for low VRAMRuns on GPUs with as little as 4GB
HardwareBetter AMD / Vulkan handlingCleaner detection and setup when CUDA is not available
InterfaceFull UI and branding refreshFeels like a finished studio, not a beta
MixerRebuilt track visualsSee your audio in detail, at higher performance
SetupImproved Package ManagerDownloads, retries, and model inventory are visible
MusicCreative AI on-demand loadingFaster startup, richer generation controls

What is the V4 Speech model?

The V4 Speech model is Foundry's new voice generation engine, and its main improvement is speaker consistency. It holds the same voice and the same character throughout a clip, even when the delivery shifts from calm to angry to a whisper. Earlier models could drift partway through a long line, which meant regenerating and hoping the next take held together. V4 is built specifically to stop that.

This matters most for anyone producing at length. If you narrate audiobooks, voice a cast of game characters, or run a faceless channel, a voice that stays in character across an entire paragraph saves you the tedious loop of regenerate, listen, regenerate again. The new native speech and TTS stack also improved batching, queueing, and the speech document workflow, so long jobs move more smoothly and stall less often.

The update also ships a refreshed demo voice library built to the new V4 quality bar, plus a new v4 zvoice format for cloned and custom voices. If you have built voices in an older format, the new lineup gives you a clean reference point for what the current system can do.

Can you run Demodokos Foundry on 4GB VRAM?

Yes. The V4 update adds a Medium model that runs on hardware with as little as 4GB of VRAM. Before this, the practical floor was higher, which shut out a lot of laptops and older cards. The Medium model brings the speech system to machines that could not load the larger models at all.

This is not a stripped demo. It is a genuine option for people who do not own a high-end GPU and were told, essentially, to come back when they upgraded. If your card sat just under the old requirement, V4 is worth a fresh look. You trade some quality and speed for the ability to actually run it, which is a trade a lot of people will happily take.

Does Demodokos Foundry work on AMD GPUs?

AMD and Vulkan support got a real upgrade in V4. The update improves AMD and Vulkan detection, VRAM reporting, backend verification, and how setup behaves when CUDA is not available. In plain terms, if you are on an AMD card, Foundry does a much better job of recognizing your hardware and configuring itself correctly instead of assuming an NVIDIA CUDA path.

Foundry is still tuned first for Windows with an NVIDIA GPU, and that remains the smoothest experience. But the Vulkan work in V4 widens the door for people who were previously stuck at the setup screen. If a Vulkan backend failed to verify for you in the past, this is the release to try again on.

What changed in the mixer and interface?

V4 delivers a full interface refresh and a rebuilt mixer with new track visuals. The visuals are graded from Simplified to Accurate, so you can pick a lightweight view or a detailed one, and both run at significantly higher performance than before. You can finally see what a track is doing in real time rather than trusting your ears alone.

The wider interface refresh cleaned up window chrome, visual feedback, and the overall desktop experience. None of this changes what Foundry can produce, but it changes how it feels to work in for hours at a stretch. A tool you stare at all day should look like it respects your time, and V4 moves the whole app closer to that.

Is setup more reliable now?

Setup and startup are noticeably more reliable in V4. The improved Package Manager now surfaces downloads, retries, model inventory, and runtime setup clearly, instead of running hidden blocking downloads in the background. When something needs to download or a step fails, you see it and you can act on it.

Startup readiness also improved, so the app is less likely to report itself ready before its core services actually are. Alongside that, the release fixes a wide range of edge-case bugs across installation, model loading, document editing, style generation, and GPU or backend selection. The result is fewer moments where you are left guessing whether the app is working or quietly stuck.

Music and Creative AI improvements

V4 continues to expand Foundry's music and creative tooling. Creative AI now loads on demand, which trims startup time and frees resources until you actually need the assistant. Caption and lyric workflows improved, generation controls got richer, and project and preset handling is more robust, including better reset, repair, append-track, and clip library behavior.

If you build full tracks in Foundry, these changes reduce friction across a session: cleaner project management, more control over what the generator does, and a Creative AI writing partner that stays out of the way until you call on it. For a closer look at how music generation fits into the wider suite, see how local AI music generation compares to Suno and Udio.

Why does local AI audio still matter here?

Every improvement in V4 lands on top of the same principle Foundry started with: it all runs on your machine. Your voice samples, your scripts, your generated audio, none of it is uploaded to anyone's server. There are no cloud queues, no credit meters, and no per-song charges. You generate as much as you want, as often as you want.

That is the part cloud tools cannot match no matter how good their models get. A better speech engine is only better if you also keep control of what you feed it. V4 makes the voices stronger and the app runs on more hardware, without asking you to hand over your files to get there. For the full picture of what the suite does end to end, see everything Demodokos Foundry can do.

Frequently asked questions

Do I have to pay extra for the V4 update?

No. The V4 update is included in your existing subscription. Foundry is still $12/month on the Creator plan, and there is a 7-day free trial through PayPal with no charge on day one.

Will my existing voices and projects still work?

Yes. V4 adds a new v4 zvoice format and a refreshed demo voice library, but your existing projects continue to work. The new format is a forward step for cloned and custom voices, not a replacement that breaks what you already built.

What are the minimum requirements after the update?

Foundry runs on Windows 10 or 11 with an NVIDIA GPU as the primary path. The new Medium model runs on as little as 4GB of VRAM, and AMD and Vulkan support improved for cards where CUDA is not available.

Does the new speech model fix voices drifting mid-clip?

That is exactly what it targets. The V4 Speech model is built to keep a voice consistent across styles and intensities, so it holds character through longer lines instead of drifting partway through.

How do I get the update?

Download the stable setup from demodokos.com and it pulls the latest version straight from Demodokos. If you already have Foundry installed, it updates to the current build for you.

Try the V4 update

If you are already a Foundry user, update and start with the new speech model on a long line you know used to drift. That is where the difference shows up fastest. If you have not tried Foundry yet, or your GPU used to be too small to run it, V4 is a fair reason to take another look. Hear what local AI audio sounds like on your own machine. No charge today, cancel anytime during the trial.

Try Foundry Free for 7 Days

No charge during the trial. Cancel anytime.

More from Echoes

Why AI Voices Lose Emotion in Long Audio (And the Fix)

AI voices drift from warm to flat over long audio. Here is why emotion consistency breaks across audiobooks and long-form work, and how local generation with explicit per-segment emotion keeps a voice steady from the first line to the last.

What GPU Do You Need for Local AI Audio?

Local AI audio needs the right GPU. Here's exactly how much VRAM you need for voice cloning, music generation, and TTS in June 2026, with specific card picks at every budget.

You Run LLMs Locally. You Generate Images Locally. Why Is Your Audio Still in the Cloud?

You went local for text and images. But every time you need a voiceover, a soundtrack, or a sound effect, you are back in a browser uploading files to someone else's GPU. Here is why local AI audio deserves a spot in your stack.

The Best ElevenLabs Alternatives in 2026 (Especially If You're Tired of the Bill)

Looking for ElevenLabs alternatives in 2026? We compare the top AI voice generators by price, privacy, and features, including one that runs entirely on your own computer.

How to Pick a TTS Tool for Production Use (Not Just Demos)

Every TTS tool sounds good on a demo. This is the version for people who actually need to ship something — covering consistency, per-character pricing at scale, API reliability, and when cloud vs. local is the right answer.

Best AI Voice Cloning Tools in 2026: The Complete Guide (Cloud vs. Local)

ElevenLabs, Resemble AI, Descript, Fish Audio, Play.ht — and one that keeps your voice on your own machine. An honest comparison of every major AI voice cloning tool in 2026, with real pricing, what happens to your voice data, and who each tool actually serves.

Best AI Music Generators in 2026: Cloud vs. Local Compared

Suno, Udio, AIVA, Boomy — and one that runs entirely on your machine. A complete comparison of every major AI music generator in 2026, with real pricing, limitations, and who each tool is actually for.

What "Digitally Signed" and "Windows Defender Verified" Actually Mean

A plain-language explanation of digital signatures, code signing certificates, and Windows SmartScreen reputation - and why new software shows a warning even when it is perfectly safe.

Foundry Is Now a Music and Speech Studio

Demodokos Foundry generates music and speech on your local machine. Voice cloning, 40 emotions, multi-speaker narration, audiobooks, podcasts, and full music production in one app.

Voice Cloning and the Emotion Engine

How voice cloning and emotional direction work in Foundry. 40 emotions, 5 intensity levels, 60 speaker presets, and cloned voices that stay in character.

Inside Foundry: How the AI Systems Work Together

Foundry is not a single model. It combines music generation, Creative AI, speech and voice tools, stem separation, DSP, and VRAM-aware local orchestration into one production system.

The Local Production Workflow: Music and Voice in One Place

Generate music and speech on your GPU. Layer them on a timeline. Apply 32 DSP effects. Export finished audio. Here is the full local production workflow.

Creative AI and the 120-Command Automation Engine

The Creative AI writes captions and lyrics from a single idea. The automation engine offers 120+ commands for batch workflows, CLI scripting, and agentic control.