What does it take to evaluate voice models in the real world? We ran a large‑scale Indic TTS evaluation, stress‑testing Sarvam AI’s Bulbul V3 against ElevenLabs and Cartesia, backed by 44,000+ human votes from over 1,000 listeners. This wasn’t about synthetic benchmarks. It was about how voices actually sound to people - across languages, accents, and real listening conditions. For TTS systems, especially in India, human‑centric evaluation isn’t optional infrastructure. It’s how quality is defined.
Real-World TTS Evaluation: Sarvam AI's Bulbul V3 vs ElevenLabs and Cartesia
More Relevant Posts
-
This felt very insightful and important with regards to where we are at with artificial intelligence in the form of large language models. Nothing about these trapped ghosts made of language, and how their capabilities relate to human and non-human intelligence is settled, or self-evident. https://lnkd.in/di-QJGTK
To view or add a comment, sign in
-
-
Since the resurgence of AI in 2022 following the breakthrough of ChatGPT, one might have expected classical AI languages such as Lisp and Prolog to experience a revival. Yet that revival did not occur. Although Prolog continues to appear in certain niche rule-based systems and logical engines where traceability and formal reasoning remain important, classical AI languages have largely become marginalized. Several factors explain this historical and technological shift. https://lnkd.in/g4rMPkSF
Why Classical AI Languages Like Lisp and Prolog Are Out of Favor?
https://www.youtube.com/
To view or add a comment, sign in
-
We recently hosted an insightful talk by Gorjan Radevski, Researcher at NEC Laboratories Europe, on compositional steering tokens – a new method for guiding large language models (LLMs) to follow multiple behaviors simultaneously by embedding behavioral instructions directly into input tokens. Gorjan explained how these tokens generalize to unseen behavior combinations and outperform existing steering approaches across different LLM architectures. To learn more, watch: https://lnkd.in/dQT4eJwv. #NECLabs #AI #largelanguagemodels
To view or add a comment, sign in
-
-
Not sure what to believe here on stinked-in the algorithm is telling me we're all doomed because AI is sentient. YouTube algo also doing the same. Anyway, on device real time bilingual speech transcription for Aotearoa now available. That's for real! Please subscribe to keep the rebellion alive 🙌🏽 https://papareo.io/piki
To view or add a comment, sign in
-
What is STT (speech to text)? For us, it is mystery on how simple thing such as "Speech to Text" work. Since I have personally been working heads down to provide "efficient", "accurate" and "disruptive price point" - speech to text AI models for Indian Languages, I thought, I will take this opportunity to share with you how this all works. It is rather long read - but written in a manner that it will make it very light to read. So enjoy and comment:
To view or add a comment, sign in
-
This substack by The Strategic Linguist covers some really important considerations about AI gaslighting discourse in meeting summaries, draft emails, confident contradictions, performance reviews, changing the way organisational information is generated. The professional reality replacement of language has no mechanism for misreprestation of information. So why are we giving power of our thoughts and words to a gaslighter))) I think AI system disclaimers for proofing the content produced increases workload, and doesn’t maintain the integrity of linguistic culture and the epistemic value of the person or workplace. Gaslighting structures for inexperienced writers or people trying to develop voice in work would be challenging to discern. https://lnkd.in/gP6swNZD
When Algorithms Gaslight: How AI Systems Reproduce Reality-Distorting Discourse Without Intent
thestrategiclinguist.substack.com
To view or add a comment, sign in
-
Big update from CallHQ 🚀 We’ve integrated Sarvam AI voices (Made in India) to support local languages and reduce calling costs. Which means AI calls on CallHQ now start from just ₹2.5. Better voices. Local language support. Lower cost AI calling.
To view or add a comment, sign in
-
Exciting news! Live Translation, now in beta with support for 50+ languages, is built directly into the T-Mobile network. When a language barrier gets in the way of a call, whether that’s family, customers, doctors, or business partners, the network should help. Using network native AI, conversations are translated in real time within our network, with no third party access or external storage. This reflects a bigger shift in how we think about the network, not just as infrastructure, but as something that actively helps people connect. Sign up for the beta here: https://lnkd.in/gp9QcMZX
To view or add a comment, sign in
-
-
Most Voice AI assistants fail because they lack the "human" ingredients: timing, tone, and empathy. At VoiceGenie, we’ve obsessed over these details so your customers never have to wonder if they’re talking to a machine or a person. Using ElevenLabs voices under the hood allows our assistants to respond instantly, adapt naturally across languages, and maintain conversational flow, without the pauses or rigidity that give AI away. The result? ✅ Zero awkward pauses. ✅ Brand-aligned personality. ✅ Seamless human handovers.
To view or add a comment, sign in