ElevenLabs reposted this
🤖 𝗪𝗲 𝗯𝗲𝗻𝗰𝗵𝗺𝗮𝗿𝗸𝗲𝗱 𝗘𝗹𝗲𝘃𝗲𝗻𝗔𝗴𝗲𝗻𝘁𝘀. 𝗛𝗲𝗿𝗲'𝘀 𝘄𝗵𝗮𝘁 𝘄𝗲 𝗳𝗼𝘂𝗻𝗱. Building a great voice agent isn't just about a great model — it's about every layer of the stack working together under real conditions. That's why we recently released EVA-Bench, our open-source framework for evaluating voice agents end-to-end across 𝗘𝗩𝗔-𝗔 (Accuracy) and 𝗘𝗩𝗔-𝗫 (Experience). This week, we evaluated ElevenLabs’ ElevenAgents, and here are the results: • 𝗦𝗰𝗿𝗶𝗯𝗲 𝘃𝟮.𝟮 𝗥𝗲𝗮𝗹𝘁𝗶𝗺𝗲 𝗶𝘀 𝘁𝗵𝗲 𝘀𝘁𝗿𝗼𝗻𝗴𝗲𝘀𝘁 𝗦𝗧𝗧 𝗺𝗼𝗱𝗲𝗹 𝘄𝗲 𝘁𝗲𝘀𝘁𝗲𝗱. Transcription accuracy on key entities above 95%, holding above 93% under French accent and coffee shop background noise. • 𝗘𝗹𝗲𝘃𝗲𝗻 𝗙𝗹𝗮𝘀𝗵 𝘃𝟮 𝘀𝗰𝗼𝗿𝗲𝗱 𝗮𝗯𝗼𝘃𝗲 𝟵𝟳% 𝗼𝗻 𝗦𝗽𝗲𝗲𝗰𝗵 𝗙𝗶𝗱𝗲𝗹𝗶𝘁𝘆 - the only metric in any end-to-end voice agent benchmark that evaluates what the agent actually says out loud. • Of all 16 systems we benchmarked, 𝗯𝗼𝘁𝗵 𝗘𝗹𝗲𝘃𝗲𝗻𝗔𝗴𝗲𝗻𝘁𝘀 𝗰𝗼𝗻𝗳𝗶𝗴𝘂𝗿𝗮𝘁𝗶𝗼𝗻𝘀 𝗹𝗮𝗻𝗱𝗲𝗱 𝗼𝗻 𝘁𝗵𝗲 𝗣𝗮𝗿𝗲𝘁𝗼 𝗳𝗿𝗼𝗻𝘁𝗶𝗲𝗿. With Claude Haiku, the highest EVA-X score. With GPT-5.4, the highest EVA-A. Shaheen Lavie-Rouse, FDE at ElevenLabs, said it best: "𝘌𝘝𝘈-𝘉𝘦𝘯𝘤𝘩 𝘪𝘴 𝘵𝘩𝘦 𝘧𝘪𝘳𝘴𝘵 𝘣𝘦𝘯𝘤𝘩𝘮𝘢𝘳𝘬 𝘵𝘩𝘢𝘵 𝘢𝘤𝘵𝘶𝘢𝘭𝘭𝘺 𝘮𝘦𝘢𝘴𝘶𝘳𝘦𝘴 𝘷𝘰𝘪𝘤𝘦 𝘢𝘨𝘦𝘯𝘵 𝘲𝘶𝘢𝘭𝘪𝘵𝘺 𝘦𝘯𝘥-𝘵𝘰-𝘦𝘯𝘥 - 𝘧𝘳𝘰𝘮 𝘵𝘳𝘢𝘯𝘴𝘤𝘳𝘪𝘱𝘵𝘪𝘰𝘯 𝘵𝘰 𝘵𝘢𝘴𝘬 𝘤𝘰𝘮𝘱𝘭𝘦𝘵𝘪𝘰𝘯 𝘵𝘰 𝘴𝘱𝘰𝘬𝘦𝘯 𝘰𝘶𝘵𝘱𝘶𝘵 - 𝘪𝘯 𝘢 𝘸𝘢𝘺 𝘵𝘩𝘢𝘵'𝘴 𝘢𝘤𝘵𝘪𝘰𝘯𝘢𝘣𝘭𝘦 𝘧𝘰𝘳 𝘱𝘦𝘰𝘱𝘭𝘦 𝘣𝘶𝘪𝘭𝘥𝘪𝘯𝘨 𝘢 𝘷𝘰𝘪𝘤𝘦 𝘢𝘨𝘦𝘯𝘵. 𝘞𝘦'𝘳𝘦 𝘢𝘭𝘴𝘰 𝘨𝘭𝘢𝘥 𝘵𝘰 𝘴𝘦𝘦 𝘵𝘸𝘰 𝘥𝘪𝘧𝘧𝘦𝘳𝘦𝘯𝘵 𝘌𝘭𝘦𝘷𝘦𝘯𝘈𝘨𝘦𝘯𝘵𝘴 𝘤𝘰𝘯𝘧𝘪𝘨𝘶𝘳𝘢𝘵𝘪𝘰𝘯𝘴 𝘰𝘯 𝘵𝘩𝘦 𝘗𝘢𝘳𝘦𝘵𝘰 𝘧𝘳𝘰𝘯𝘵𝘪𝘦𝘳." This is one of many evaluations we'll be releasing. If you're building voice agents — or building the models that power them — run EVA-Bench on your stack and show us where you land. 🔎 Want to know more? 🌐 𝗪𝗲𝗯𝘀𝗶𝘁𝗲: https://lnkd.in/eaRnvm7G 💻 𝗖𝗼𝗱𝗲: https://lnkd.in/e53m3GYe 🗂️ 𝗗𝗮𝘁𝗮𝘀𝗲𝘁: https://lnkd.in/erCqPkc6 📄 𝗣𝗮𝗽𝗲𝗿: https://lnkd.in/e_ShvWDw Technical Contributors: Tara Bogavelli, Gabrielle Gauthier Melançon, Katrina Stankiewicz, Oluwanifemi Bamgbose, Fanny Riols, Hoang Nguyen, Raghav Mehndiratta, Lindsay Brin, Joseph Marinier, Hari Subramani Leadership & Product: Sridhar Krishna Nemala, Anil Kumar Madamala, Srinivas Sunkara, Joyce Li, Nitin Aggarwal #VoiceAI #VoiceAgents #AIResearch #ConversationalAI #ServiceNowResearch #ElevenLabs