Real-World TTS Evaluation: Sarvam AI's Bulbul V3 vs ElevenLabs and Cartesia

This title was summarized by AI from the post below.

Josh Talks•69K followers

1mo

What does it take to evaluate voice models in the real world? We ran a large‑scale Indic TTS evaluation, stress‑testing Sarvam AI’s Bulbul V3 against ElevenLabs and Cartesia, backed by 44,000+ human votes from over 1,000 listeners. This wasn’t about synthetic benchmarks. It was about how voices actually sound to people - across languages, accents, and real listening conditions. For TTS systems, especially in India, human‑centric evaluation isn’t optional infrastructure. It’s how quality is defined.

1 Comment

To view or add a comment, sign in

More Relevant Posts

Jonatan Hilden

478 followers
1mo
Report this post
This felt very insightful and important with regards to where we are at with artificial intelligence in the form of large language models. Nothing about these trapped ghosts made of language, and how their capabilities relate to human and non-human intelligence is settled, or self-evident. https://lnkd.in/di-QJGTK
1 Comment
Like Comment
To view or add a comment, sign in
SCASA - Southern California Chapter, American Statistical Association

71 followers
1mo
Report this post
Since the resurgence of AI in 2022 following the breakthrough of ChatGPT, one might have expected classical AI languages such as Lisp and Prolog to experience a revival. Yet that revival did not occur. Although Prolog continues to appear in certain niche rule-based systems and logical engines where traceability and formal reasoning remain important, classical AI languages have largely become marginalized. Several factors explain this historical and technological shift. https://lnkd.in/g4rMPkSF

Why Classical AI Languages Like Lisp and Prolog Are Out of Favor?

https://www.youtube.com/
Like Comment
To view or add a comment, sign in
NEC Laboratories Europe

6,032 followers
1mo
Report this post
We recently hosted an insightful talk by Gorjan Radevski, Researcher at NEC Laboratories Europe, on compositional steering tokens – a new method for guiding large language models (LLMs) to follow multiple behaviors simultaneously by embedding behavioral instructions directly into input tokens. Gorjan explained how these tokens generalize to unseen behavior combinations and outperform existing steering approaches across different LLM architectures. To learn more, watch: https://lnkd.in/dQT4eJwv. #NECLabs #AI #largelanguagemodels
Like Comment
To view or add a comment, sign in
Keoni Mahelona

Te Hiku Media•2K followers
1mo
Report this post
Not sure what to believe here on stinked-in the algorithm is telling me we're all doomed because AI is sentient. YouTube algo also doing the same. Anyway, on device real time bilingual speech transcription for Aotearoa now available. That's for real! Please subscribe to keep the rebellion alive 🙌🏽 https://papareo.io/piki

1 Comment
Like Comment
To view or add a comment, sign in
Saurabh Vajpayee

DeepMindSecure Technologies…•2K followers
1mo Edited
Report this post
What is STT (speech to text)? For us, it is mystery on how simple thing such as "Speech to Text" work. Since I have personally been working heads down to provide "efficient", "accurate" and "disruptive price point" - speech to text AI models for Indian Languages, I thought, I will take this opportunity to share with you how this all works. It is rather long read - but written in a manner that it will make it very light to read. So enjoy and comment:
Like Comment
To view or add a comment, sign in
Dr Ingrid H Lee

Good News Lutheran College•2K followers
1mo Edited
Report this post
This substack by The Strategic Linguist covers some really important considerations about AI gaslighting discourse in meeting summaries, draft emails, confident contradictions, performance reviews, changing the way organisational information is generated. The professional reality replacement of language has no mechanism for misreprestation of information. So why are we giving power of our thoughts and words to a gaslighter))) I think AI system disclaimers for proofing the content produced increases workload, and doesn’t maintain the integrity of linguistic culture and the epistemic value of the person or workplace. Gaslighting structures for inexperienced writers or people trying to develop voice in work would be challenging to discern. https://lnkd.in/gP6swNZD

When Algorithms Gaslight: How AI Systems Reproduce Reality-Distorting Discourse Without Intent

thestrategiclinguist.substack.com
Like Comment
To view or add a comment, sign in
CallHQ.ai

237 followers
3w
Report this post
Big update from CallHQ 🚀 We’ve integrated Sarvam AI voices (Made in India) to support local languages and reduce calling costs. Which means AI calls on CallHQ now start from just ₹2.5. Better voices. Local language support. Lower cost AI calling.
Like Comment
To view or add a comment, sign in
Matt Breese

T-Mobile•1K followers
1mo
Report this post
Exciting news! Live Translation, now in beta with support for 50+ languages, is built directly into the T-Mobile network. When a language barrier gets in the way of a call, whether that’s family, customers, doctors, or business partners, the network should help. Using network native AI, conversations are translated in real time within our network, with no third party access or external storage. This reflects a bigger shift in how we think about the network, not just as infrastructure, but as something that actively helps people connect. Sign up for the beta here: https://lnkd.in/gp9QcMZX
Like Comment
To view or add a comment, sign in
VoiceGenie

572 followers
1mo
Report this post
Most Voice AI assistants fail because they lack the "human" ingredients: timing, tone, and empathy. At VoiceGenie, we’ve obsessed over these details so your customers never have to wonder if they’re talking to a machine or a person. Using ElevenLabs voices under the hood allows our assistants to respond instantly, adapt naturally across languages, and maintain conversational flow, without the pauses or rigidity that give AI away. The result? ✅ Zero awkward pauses. ✅ Brand-aligned personality. ✅ Seamless human handovers.
Like Comment
To view or add a comment, sign in

68,728 followers

View Profile Connect

Real-World TTS Evaluation: Sarvam AI's Bulbul V3 vs ElevenLabs and Cartesia

More from this author

Future of Skills: Five Things We Must Focus On

International Women's Day, Everyday!

Explore content categories

Real-World TTS Evaluation: Sarvam AI's Bulbul V3 vs ElevenLabs and Cartesia

More Relevant Posts

Why Classical AI Languages Like Lisp and Prolog Are Out of Favor?

https://www.youtube.com/

When Algorithms Gaslight: How AI Systems Reproduce Reality-Distorting Discourse Without Intent

thestrategiclinguist.substack.com

More from this author

Future of Skills: Five Things We Must Focus On

International Women's Day, Everyday!

Explore related topics

Explore content categories