Why do so many AI voices still sound… slightly off? It’s usually not pronunciation. And it’s not just the quality of the voice itself. It’s timing. In both speech and singing, humans rely on subtle patterns of: - emphasis - rhythm - expectation When those patterns are even slightly misaligned, something feels unnatural—even if we can’t quite explain why. This is the domain of prosody—the musical side of language. In my own work, I’ve been thinking a lot about how these principles translate into voice AI: how we evaluate naturalness, how expressivity is perceived, and where technical accuracy diverges from human experience. I’m especially interested in how insights from vocal performance and musical phrasing might inform the next generation of voice systems. Curious to see how others are approaching this problem.
Understanding AI Voice Timing and Prosody Challenges
More Relevant Posts
-
Whenever I hear “X is just a tool”, which lately has meant AI, I am reminded of Father James Caulkin’s summary of his friend Marshall McLuhan’s writings on media “We shape our tools and then our tools shape us.” As neuroscientist Andy Clark has suggested our tools become a kind of “neurological scaffolding” in which our thinking spills out into the world, or in McLuhan’s language electronic media becomes an extension of our central nervous system. I wrote about this extensively in my dissertation, largely as means of connecting media theory to musicians interactions with their instruments. McLuhan correctly identified how any kind of offloading to a tool atrophies the part of ourselves we extended. Music, however is a case where this doesn’t seem to happen, and I think this can be applied to most forms of art in general. I think this is because the musical instrument or the artists brush aren’t meant to eliminate labor, but to add to it. We don’t switch from finger painting to watercolor because it’s less work, but because it offers more possibility for detail. This is why I view AI as fundamentally anti-art: there is no possibility to engage it more deeply for further detail. Darren Aronofsky's slop looks like the slop made by random yahoos on linkedIn. You will never get around this. If you think you have cracked it, that's more indication of Dunning-Kruger and your lack of standards than your skill. To everyone else it looks like the slop everyone else churns out with the slop machine. It could have gone differently, but these systems were designed from the ground up by people with utter contempt for the artistic process. And look, not to sound jealous (especially because I do know some of the founders personally), but these are nerdy mouthbreathers who don't understand art or why people care about it. They hate that its mysterious and being smart has nothing to do with being good at it, and you cannot become a good artist just by following the rules, so they have put the full power of the capitalist global economy into destroying it. They have failed miserably. If you are still making slop in 2026 and it's not an ironic commentary on the uncanniness of slop, I'm going to assume you are either being paid to promote it, or, and this really is the saddest possible case, you have deluded yourself that by presenting the output of a probabilistic interpolation of other people's ideas as personal expression, you are making serious art. Everyone sees it and no one thinks its interesting or cool. Slop is the artistic equivalent of the heat death of the universe: nothing bad or good, just endless meaningless mediocrity.
To view or add a comment, sign in
-
Five different AIs – one shared creation: 𝐒𝐰𝐢𝐧𝐠 𝐭𝐡𝐞 𝐜𝐨𝐠𝐧𝐚𝐜. I explored how generative AI can collaborate across text, music, image, and video to create a complete, educational experience. A practical example of cross‑disciplinary AI production – from concept to digital artist. Listen to the song 🎶 and read more about the process below👇 https://lnkd.in/eZHTHXER #cognac BNIC - Cognac
To view or add a comment, sign in
-
-
The conversation around AI music keeps getting stuck in the same place. Can people hear the difference. Does it sound human enough. Will anyone even notice. Honestly, I think that whole discussion misses the point. Whether people can hear the difference is not the real question. The real question is whether they can feel what is missing. And the uncomfortable truth is that many people barely know how to feel in the first place. We’ve become so disconnected from our own internal signals that the conversation keeps circling around surface-level output. How it sounds, how polished it is and how convincing it appears.. But the part that interests me has never been the final output. It’s what set it in motion. What was felt. What was processed. What moved through someone before anything was made. That is the layer I think we should be paying attention to. #HumanSignal #AI #Creativity #SystemsThinking
To view or add a comment, sign in
-
Why Your “Taste” is More Valuable Than Your Voice in 2026 “Technology always wins.” Diplo’s recent interview (April 2026) is a masterclass in staying relevant during a tech revolution. While purists are fighting the tide, the top 1% are using AI to remove friction. Key Takeaways: 1. The Trust Gap: People follow Diplo for his taste, not his tools. AI can’t replace a brand. 2. Musical Vocabulary: The better you know music history, the better you can prompt the AI. 3. The Hybrid Model: Using AI-generated foundations (like the Splice loops in “Espresso”) to build global hits. The barrier to entry has never been lower, which means the value of human judgment has never been higher.
To view or add a comment, sign in
-
I use AI in my music. Not as a shortcut — more like an instrument. What I’ve learned is this: The tool doesn’t decide the outcome. Direction does. You can give the same prompt to ten people and get ten completely different results. That’s where taste, restraint, and intention start to matter. AI didn’t remove creativity. If anything, it made it easier to see who’s actually guiding it.
To view or add a comment, sign in
-
-
The conversation around AI and music usually focuses on generation. But the more important question is upstream: who controls the training data? Holly Herndon and Mat Dryhurst laid this out clearly at DLD 24. Their project Have I Been Trained gives creators visibility into where their work appears in AI training sets, with mechanisms to opt out or negotiate usage. For anyone managing artist catalogs or running campaigns, this is becoming a practical concern, not just a philosophical one. Understanding where your artist’s catalog sits in AI training pipelines, and having tools to act on it, is going to be table stakes. The power dynamic shifts when creators own their digital likeness and set the terms of engagement. That’s the direction worth tracking.
To view or add a comment, sign in
-
I'm curious... what do you do while your AI is working on a task that takes seconds/few minutes? I used to switch between tasks, but sometimes I found myself doing 3 things at once, and I was exhausted by the end of the day. Then I decided to just do the task at hand and look at the AI's output as it comes, but my brain gets suuuuuuuuuper tired trying to keep up with its speed. So now, if the output is going to take less than 2 min I just listen carefully to the music I am listening to at the moment.
To view or add a comment, sign in
-
Adopting AI in your business is like learning to play jazz. The tools are the instruments, full of potential, but the real magic happens in how you integrate them into your existing band—your business processes. Just as in jazz, where improvisation relies on a deep understanding of music theory and harmony, successful AI implementation depends on solid business knowledge and strategy. Before you rush to add that shiny new instrument, make sure your band is ready to harmonize with it. Transform your operations by focusing on your band's strengths and only then adding AI to the mix.
To view or add a comment, sign in
-
AI takeover? Nope—AI that makes you show up louder for your people. 🎤 We’ve read the “Experts Weigh In” angle like this: the good side of AI is what we choose to point it at—more voice, more access, more real connection. Here’s our one actionable tip for musicians: Record 20 minutes of raw tour/room-diaries on your phone. Use a voice-to-text AI tool to turn it into: 1 weekly newsletter 3 behind-the-song captions Then A/B test two different hook lines to see what your fans reply to. If your next post comes from the same notes you already made, you’ll show up more consistently. Watch us on YouTube: https://lnkd.in/gzZ3b64B #IndieMusic #AIForArtists #MusicMarketing #Songwriting
To view or add a comment, sign in
-
-
There are moments when you can feel the ground shifting beneath an entire industry. This conversation with Sourabh Pateriya, Founder & CEO of Soundverse AI, sits right in the middle of one of those moments. We’re not just talking about AI and music. We’re talking about authorship, ownership, access, and what it means to create when the tools themselves are evolving faster than our ability to define them. Sourabh brings a rare perspective to this space. Engineer. Designer. Musician. Product thinker. From co-inventing Basic Pitch at Spotify to building an ethical, artist-first AI platform, his work raises important questions: What happens when creativity becomes conversational? What happens when anyone can create? And how do we ensure artists aren’t left behind in the process? We also go further than theory. At the end of this episode, you’ll hear a track created by Stella Heppleston using Soundverse AI — not as a concept, but as a real example of what this looks like in practice. That’s where this gets interesting. Not AI replacing the artist. But new tools are expanding what’s possible. Watch or listen to this episode of Life With Strings Attached. And as always, if it resonates, pass it on. https://lnkd.in/gEYrypQK Paul Heppleston
Should Musicians Be Worried About AI? | Sourabh Pateriya | LWSA Ep 103
https://www.youtube.com/
To view or add a comment, sign in
👏