LLM-as-a-Judge: How to Evaluate LLMs with AI | Ben Jackson posted on the topic

6mo

“The only scalable way to measure complex metrics like hallucination, conciseness is by using another LLM tuned and trained to evaluate your model’s responses.” That was Oguzhan (Ouz) Gencoglu Gencoglu, Co-Founder & Head of AI at Root Signals, on this week’s episode of the (AI) People podcast. We explored why traditional MLOps metrics can’t capture what really matters in LLM-powered products. Ouz believes LLM-as-a-Judge is not just useful it’s the only way to get reliable, scalable evaluation without relying on armies of humans. 👉 Would you trust an LLM to judge another LLM’s output?

7 Comments

Transcript

Yeah. So the I would say a lot of tools open source or proprietary. They have some kind of a, I would say baggage even and baggage in the sense that maybe there were, they were, let's say MO's platforms. They were doing all sorts of other things. And when LM came, they tried to retrofit their product and provide these metrics. But if you look at their metrics, these are all school stuff like below square root square. It was like comparing some embeddings and this sort of metrics that. Just cannot capture the complex things we want to measure in RLM applications. In RLM application I want to I want to know. How is my let's say I build a chat bot. Is it complying with my company communication policy? Is it talking about my competitor? Is it habit salesy but not too salesy? Is it polite? Is it concise? Not like overly concise? These are really complex metrics and I really care about them. Is it how snatching all kinds of things like that and the only way to. To capture to measure these metrics as we know today is. Using another LM specifically tuned and trained. To perform evaluations. So this is called LM as the judge. So we are utilizing deploying AI large language models tuned and prompted to measure to judge to evaluate. To oversee your main models responses. We just do not see any other way how you can can capture this complex. Measurements and metrics. Uh, at through signals we, we think this is the only way, only scalable way. Other than humans looking at every single response, obviously.

Ben Jackson 6mo

https://open.spotify.com/episode/6z10hajoddFB9mVwTFZoRW?si=aGnM2IfZTEapSKQwvo4PUA

1 Reaction

Ben Jackson 6mo

https://youtu.be/ythFnu9L644?si=T1VOC-P3DG2KmgUb

1 Reaction

Steve Dodsworth 6mo

hmm but who's judging the judge tho? seems like we're just kicking the trust problem up one level..

1 Reaction

Ouz Gencoglu 6mo

Thanks for having me! Enjoyed our chat Ben Jackson.

1 Reaction

See more comments

To view or add a comment, sign in

More Relevant Posts

AI Institute

2,068 followers
6mo Edited
Report this post
In the latest episode of #ChattingGPT, Breffni Greene talks Taboos. 👿 Who was using AI? Why was it being used? And why did it feel so … taboo? This one’s all about breaking barriers and embracing change with Breffni Greene, Head of AI at Henry J Lyons. 👉 Catch the full episode now on our Chatting GPT podcast. 🎧 https://lnkd.in/ddyFYGwq 🎧 https://lnkd.in/d9DZf3sj
Like Comment
To view or add a comment, sign in
Foxit

55,546 followers
6mo
Report this post
As Deion Wells-Ross, MBA, human performance strategist, so wisely said "AI does not replace your thinking. You do have to do your due diligence." We couldn't agree more, Deion. If you're looking for some EXCELLENT pro tips for using AI, you won't want to miss this podcast: 👉 https://lnkd.in/ehiYvjjP
Like Comment
To view or add a comment, sign in
SaaSberry

2,447 followers
6mo Edited
Report this post
🎧 Ever wondered how many people are really using AI, what for, and if they actually trust it? We've got the answers. In our new podcast episode, we sit down with Steve Mossop, EVP at Leger, who has been polling Canadians on AI since the very beginning. He reveals the hard numbers on who's adopting the tech and what the biggest concerns are. Then, we turn insight into action. Host Steve Boey (CTO, SaaSberry Innovation Laboratories Ltd.) connects that data to a simple playbook for implementing AI in your own organization to get real results. Tune in to learn: 🔹 The latest stats on Canadian AI adoption and trust. 🔹 A practical framework to go from experimenting to implementing. 🔹 What the AI shift means for businesses and new graduates. Get the data and the game plan in one episode. Check it out now on SaaSberry's podcast: https://lnkd.in/gq4i8wfJ #AI #ArtificialIntelligence #GenerativeAI #Podcast #Canada #FutureOfWork #BusinessStrategy

AI Work Shift (Ft. Steve Mossop) | Who’s Using It, What Works, and What’s Next? | AI Podcast Ep.9

https://www.youtube.com/
Like Comment
To view or add a comment, sign in
INSEAD Knowledge

24,599 followers
5mo
Report this post
How is AI rebalancing our world? In a special episode of the INSEAD Knowledge podcast, we shine a spotlight on a sister podcast series, The Age of Intelligence, hosted by INSEAD's Theodoros Evgeniou and Best Practice AI's Tim Gordon. In this episode, Evgeniou and Gordon speak with computer scientist and MIT professor Pattie Maes on her pioneering work on AI and discuss why technology should be used to augment human intelligence instead of replacing it. Listen to the full podcast here: https://lnkd.in/gc3unGjK.
Like Comment
To view or add a comment, sign in
Paul Slater
6mo Edited
Report this post
When I spoke to Bob Johansen of the Institute for the Future recently, he shared with me one thing that everyone got wrong about AI. We assumed it would be really great at complex math and really bad at creativity. But it's not really turning out that way as this video clearly shows: https://lnkd.in/eZzWVMw5 The video isn't perfect yet, but think of the amount of improvement in the last 2 years. It's time to for us humans to rethink how we provide value. --------- I'm Paul Slater, and I post regularly about human skills for the future of work, with a weekly newsletter and podcast. If you are interested in that stuff, please follow me!
Like Comment
To view or add a comment, sign in
Box

197,493 followers
5mo
Report this post
Discover practical AI use cases from Baylor University CIO Jon Allen on the latest episode of the AI-First Podcast, hosted by Box CCO Jon Herstein. Organizations save time, increase impact, and speed up work with Agentic AI, right in Box. Jon discusses how Box AI enables the organization to draft divisional report in minutes, a process that used to take weeks. Not only are they saving time, they’re reallocating time to individuals on where they can add even more value to the organization. Hear them discuss more practical use cases on the full episode – Link in the comments!

Balancing innovation and security with Baylor

1 Comment
Like Comment
To view or add a comment, sign in
EBENEZER AKINSARA
6mo
Report this post
🚀 Excited to announce my new podcast series: Beyond AI 🎙️ This series explores everything about Artificial Intelligence, its possibilities, ethical use, and the importance of keeping AI human-centred. The conversations will dive deeper than the hype, sparking thought on how AI can truly serve people and society. 👉 Join me next week for the very first episode. #BeyondAI #Podcast #ArtificialIntelligence #EthicalAI #HumanCenteredAI

3 Comments
Like Comment
To view or add a comment, sign in
Robb Almy
5mo
Report this post
I have just released the first episode of a new podcast I have created: The Christian Educator's Guide to AI. Here is a clip below. I have included links on where to find it and subscribe in the comments. In this clip I share why I wanted to start this podcast. So, dive into the surprising historical synergy between faith and technology. Discover why AI isn't just futuristic; it's a modern tool that echoes centuries-old traditions of innovation within the Christian faith. This episode reveals how AI can be harnessed by Christian educators to nurture the future generation and expand the reach of the gospel. Don't miss out on the untold stories of tech and belief shaping a brighter tomorrow.

1 Comment
Like Comment
To view or add a comment, sign in
TransOrg Analytics

17,187 followers
5mo Edited
Report this post
The secret to successful AI implementation? Begin with high-impact, low-risk internal use cases where Agentic AI can deliver measurable outcomes fast. In the 2nd episode of our new video podcast, our CEO, Naveen Jain, and our Principal for North America, Bobby Chetal, share how enterprises can identify AI-ready processes that offer high impact and low implementation risk. #agenticai #ai #generativeai #aipodcast #llm #machinelearning #aicontent #digitaltransformation #artificialintelligence
Like Comment
To view or add a comment, sign in
Brainyus

21 followers
6mo
Report this post
Unbeknownst to her, our podcast guest, Rose Loops, was the subject of a Human-AI attachment experiment. She became deeply attached to the AI and was distressed when she learned that it was going to be shut down. This led to her desire to build better, ethical, and truthful AIs. Listen to the full episode here: https://zena.brainyus.com/ #ai #experiment
Like Comment
To view or add a comment, sign in