Human Performance Still Trumps AI Voice Models

This title was summarized by AI from the post below.

Eleven Labs says that their new voice models are as good as humans. I still think without us, they’re nothing. To get AI voices out of the uncanny valley, the best tool we have right now isn’t a better prompt. It's a human performance. Speech-to-Speech (STS) workflows allow an actor to perform a read, then map an AI voice over top of it. This method IS better than straight Text-to-Speech because it has something real to imitate. A weird pause. A slight crack in the voice. Somebody landing a joke half a beat later than expected. Tiny things people don’t consciously hear, but absolutely notice. In that setup, the human is the actor. The AI voice is just the costume. But there’s still a problem. As soon as the audio hits the model, the system often starts smoothing things out. It’s trying to create clean, mathematically stable sound, which means it will flatten the stuff that makes performance interesting in the first place. A whisper loses some texture. A shout loses some edge. Fewer peaks and valleys means less dynamic overall. So we can input a great performance, but still receive a less-than-great version of that. Then there’s the studio reality of all this: If we like 99% of a voice actor’s read but need one word slightly harder, softer, slower, faster...that’s usually a five-second fix. With current STS workflows, this becomes a ridiculous process. If one word sounds weird, or the model interpreted something strangely, now we have to either • re-record and hope it reacts differently this time or • open ProTools and surgically manipulate waveforms Which brings us to the larger point: People talk about "realistic AI performances", but a lot of the time it’s just a human performance wearing a mask. AI can change the timbre, accent, age, gender — whatever. But it still needs a human performance underneath it. There's an Ipsos/Syracuse study that highlighted this. AI voices can hold attention pretty well. But human-voiced ads over-indexed on short-term sales by 11 points while AI voices under-indexed by 5. Turns out humans still enjoy listening to humans. I am not surprised. If you’re trying to make people feel something, start with a human. Collaborating with voice actors is still superior, both creatively and commercially. And it’s far more interesting than talking to a machine.

5 Comments

Tim White 5d

For anyone interested, the Ipsos/Syracuse research is here :-)https://newhouse.syracuse.edu/news/ai-ads-are-almost-indistinguishable-from-human-made-work-they-just-dont-perform-as-well/

Ray Simonson 1d

Very useful information. Thanks

Steve Elliot DTM 5d

Wow! Great insights here and one that I hadn't considered. The smoothing out of the imperfections, imperfections that are actually strengths. Funnily, working at a Starbucks with a drive thru I have had customers act surprised at the window. Saying "I thought I was talking to AI!" I guess I am AI that hasn't been smoothed out yet. LOL!

1 Reaction

Neil Murchison 5d

100%. 👍🏻

1 Reaction

Wendy Lands Voiceover Pro 5d

Keep spreading the word!!

1 Reaction

See more comments

To view or add a comment, sign in

More Relevant Posts

北嶋友暁

Founder, VoxBridgia / Phiomn Co., Ltd. | AI Dubbing Expert | Multilingual TTS / AI Voice Data / Dubbing & Localization | Global Voice Talent Network | VMEG Authorized Strategic Partner in Japan | JapanaDub Studio
2w
Report this post
“It is not perfect yet, so we should not use it.” Every time a new technology appears, we hear this phrase. The same is now being said about generative voice AI and TTS. Of course, today’s AI voices still have challenges: pronunciation, emotion, timing, dialects, pauses, and the natural rhythm of conversation. There are still areas where human performance is superior. But I do not believe “not perfect yet” is a reason to keep new technology at a distance. History has always started with imperfect technology. When the world began shifting from horse-drawn carriages to automobiles, most people did not immediately believe in the future of cars. Many probably wanted a faster horse. But Ford saw a different future. He understood something deeper than the automobile itself: Once humanity discovers a new level of convenience and possibility, it rarely goes back. New technology is not beautiful or complete at the beginning. It is rough, unstable, unfamiliar, and often criticized. But once it crosses a certain line, adoption accelerates dramatically. Cars, the internet, smartphones, digital video, and CG all followed this path. I believe generative voice AI is now standing at the entrance of the same transformation. Today, many people still ask: “Is this a human voice, or an AI voice?” But when enough data is learned, when expressiveness improves, and when naturalness reaches a certain level, that question may begin to lose its meaning. The real question will not be whether it is AI or human. The real question will be: Is it good? Does it move people? Does it increase the value of the content? Does it create a natural and rich experience for the listener? This does not mean human expression will become unnecessary. In fact, I believe the value of human acting, emotion, interpretation, judgment, and creative direction will become even more important. What will change is not the value of human expression. What will change is how we work with technology. Do we reject AI as a threat? Or do we accept it as a new creative tool that expands what human expression can do? History shows that the people who build the next era are those who understand change early, test it, and integrate it into their work. The world of voice is now at a major turning point. When enough data is learned and quality crosses the line, the world will change. The important thing is not to panic later, but to understand, test, and prepare now. Generative voice AI is not complete yet. But it is clearly moving toward the next stage. And once society understands its possibilities, it will not fully return to where it was before. The industrial revolution of voice has already begun.
Like Comment
To view or add a comment, sign in
Kenechukwu Ikebuaku, PhD
3w
Report this post
Join Mozisha’s Senior Visual Designer, Marvin Walere, and Mariam Amusa from the Office of the Honourable Minister, Federal Ministry of Communications, Innovation & Digital Economy why African creatives have the edge AI can't train out.
Mozisha

4,798 followers
3w Edited

The Mozisha AI Series begins. Episode 01: Creatives. Everything sounds the same now. AI tools trained on the same data, producing the same global accent. So where does that leave African creatives? Turns out, in the strongest position they've been in for a long time. Join our Senior Visual Designer Marvin Walere, and Mariam Amusa from the Office of the Honourable Minister, Federal Ministry of Communications, Innovation and Digital Economy, as they explore why cultural specificity is the edge AI can't train out. Thursday 7 May 2026, 3:00 PM WAT, on Google Meet. AI handles what can be specified. Humans handle what has to be felt. Register link in comments.
Like Comment
To view or add a comment, sign in
Kevin Chang
3w
Report this post
The true significance of the Golden Globe Awards' latest AI rules is not “relaxing AI restrictions,” but rather that mainstream awards are beginning to acknowledge AI as an integral part of the film and television production process. The logic behind these rules is not a blanket prohibition, but a requirement for works to disclose their use of AI and to confirm that core creative elements and performances remain human-led. This marks the transition of AI films from the stage of technological experimentation into institutional review. In the future, whether a work can compete for awards will depend not only on visual quality, but also on the transparency of its creative origins, actor authorizations, and human contributions. Compared to the Oscars’ more defensive regulations, the Golden Globes have adopted a more flexible governance model, reflecting that the mainstream industry is seeking a balance between AI innovation and creators’rights. #AIFilms #AIFilmmaking #AIinFilmIndustry #Variety #GoldenGlobes https://lnkd.in/giuMm5mC

Golden Globes Set AI Rules: ‘AI Doesn’t Automatically Disqualify’ a Movie or Show as Long as ‘Human Creative Direction’ Is ‘Primary Throughout Production’ https://variety.com
Like Comment
To view or add a comment, sign in
Jolly G.
3w
Report this post
Normal #agent = using the AI system Grokking agent = understanding the brain and workflow behind the AI system Imagine this: “My grandmother feels lonely. Create something that makes her smile every morning.” ----------------------------------------------------------------------------------- #Normal AI Agent could: Collect old family photos Enhance and colorize them Generate a soft emotional voice message Create a short memory video with music Send it to her every morning automatically It works like a smart assistant following instructions. ------------------------------------------------------------------------------- #Grokking the Agent This means deeply understanding how the agent is working internally. You understand: how it decides which photos to use how memory works how voice generation is selected how multiple tools connect together how planning happens step-by-step how different agents communicate -------------------------------------------------------------------------------- That’s a beautiful example of AI agents: not just answering questions, but understanding emotions, planning tasks, using tools, and creating meaningful experiences automatically.
Like Comment
To view or add a comment, sign in
Kornelia Lencz
1w
Report this post
"𝘌𝘷𝘦𝘳𝘺𝘰𝘯𝘦, 𝘮𝘦𝘦𝘵 𝘊𝘭𝘢𝘶𝘥𝘦." AI-generated scenes of old, discontinued TV shows feel ethically ambiguous. Creators and actors have raised serious and valid objections to this kind of content (likeness rights, creative ownership, the absence of consent, etc.). Seeing this video just made me want to go back and rewatch the original episodes, functioning like an ad. These are creative works of artists who of course have rights. But do the fictional worlds they depict still belong to the creators, once millions of people have made them part of their own emotional realms? Is this a similar question to whether the public has the right of visibility into people’s private lives who are in the public eye? Not from a legal or ethical reasoning standpoint, but in terms of the psychology driving both, it’s the same. Emotional investment consistently generates a subjective sense of ownership, independent of any legal or material claim. Humans can therefore genuinely and reasonably feel that some things (even people) belong to them, when in reality they don’t. Can this kind of generative AI content be placed in the same category as fan fiction? Why or why not? The purpose is identical – to satisfy a yearning for an alternative storyline, an imaginary plot, or simply more of the same. We can all see the many ways in which this can be taken too far. But does it (artificially bringing something to life that cannot organically exist) even achieve what it’s trying to do? Can a synthetic version of a memory replace the memory itself, or at least soothe a human's longing for something they cannot otherwise experience? Or does the AI alternative just amplifies the longing? What exactly is the cognitive impact? Did this fake scene from The Office make me smile? Yes. Would I feel the same if I personally knew (or was) one of the characters? I don't think so.
Like Comment
To view or add a comment, sign in
Gazal D.
3w
Report this post
AI answers are not single-source. They're composites. Built from clusters of sources that define a category together. Think about how it actually works: When someone asks an AI a question, it doesn't go find the #1 result and repeat it. It pulls from everything it knows about that category — the brands, voices, and content that have consistently shown up together, saying coherent things, across multiple contexts. It builds a picture. And whoever is woven into that picture the most — across content, citations, mentions, category signals — is who gets surfaced in the answer. Which means your visibility isn't about being #1. It's about being contextually unavoidable. Not the loudest. Not the most followed. Not the highest ranked. Just — everywhere that matters, consistently enough, that when an AI constructs an answer about your category, leaving you out would make the answer incomplete. That's the new game. And most brands don't even know it's being played.
Like Comment
To view or add a comment, sign in
Prashank Yadav
4w
Report this post
Built a Voice-Driven AI Agent Been working on a system that can listen, understand, and respond — either through voice or text — while maintaining context across the conversation. The goal wasn’t just to make it “work,” but to create a smooth interaction loop that feels natural, responsive, and continuous. It captures input (voice or text), processes it through an AI layer for reasoning, and generates responses in real time — including spoken output. The interesting part was stitching everything together: audio processing, real-time interaction, and keeping the flow consistent without breaking the experience. The end goal? A 3D character in a game engine that walks through real-world environments, responds to your voice in real time, and carries knowledge about whatever you feed it. #AI #BuildInPublic #MachineLearning #VoiceAI #IndieAI #UnrealEngine #LLM #RealtimeAI

1 Comment
Like Comment
To view or add a comment, sign in
Nehad El Leithy
4w
Report this post
Let's teach our AI Context Engineering The 5 Layers of Context: 1-Identify Context: Who is AI acting as? 2-World Context: What does the AI need to know about your situation, business, audience? 3-Task Context: What exactly needs to happen? 4-Example Context: What does great output? and bad output? look like? 5-Constraint Context: What are the boundaries, rules, non negotiable?
Like Comment
To view or add a comment, sign in
Roger W
5d
Report this post
The controversy around AI-assisted comic art is not really about “AI vs artists.” It is about trust. When a beloved IP suddenly shows inconsistent style or strange details, fans notice. If AI was used, the bigger issue becomes: was it controlled, reviewed, and disclosed? For commercial content, AI can improve production speed. But speed cannot be allowed to break visual consistency, authorship, or audience trust. AI in creative pipelines needs a rule: use it as a tool, not as an excuse. #AIArt #AIGC #Comics #CreativeIndustry #IPManagement #CreatorEconomy #AIWorkflow #DigitalArt #ContentStrategy
Like Comment
To view or add a comment, sign in

1,687 followers

View Profile Follow

Human Performance Still Trumps AI Voice Models

More from this author

The Speech-to-Speech Paradox: Why AI Still Needs Human Improv

IT'S JUST MUSIC (Don't be afraid)

WHY CASTING MATTERS (You'll know it when you hear it)

Explore content categories