From the course: AI Evaluations: Foundations and Practical Examples
Unlock this course with a free trial
Join today to access over 25,200 courses taught by industry experts.
Demo of fully functional human and auto-evaluator systems
From the course: AI Evaluations: Foundations and Practical Examples
Demo of fully functional human and auto-evaluator systems
- [Instructor] I know there are a lot of challenges in building AI agents for production. In this course, what we will do next is as we go along, we'll build real applications like this chat bot where you could upload a contract, ask questions, and get answers. Not only that, we'll show you how can you actually use these experts to build the vertical agents like legal agent here and use the knowledge from them to set up manual evaluation. Not only just accuracy, but how subjective things like helpfulness, harmless, honesty can be articulated as something that you can judge and set your evaluations. Further, we will look at not only just human expert, but how can you scale these evaluations. As I'm showing here, we will use an LLM judge to actually produce these values rather than just calling human agents to do our evaluations. And then we will figure out how can you put all the logging to check what is the latency in your system, how many tokens it consumes, when it fails, when it…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
(Locked)
Demo of fully functional human and auto-evaluator systems2m
-
What are AI agents?3m 49s
-
(Locked)
Why a lot of AI agents fail3m 12s
-
(Locked)
Understanding the "moat" in AI agents2m 50s
-
(Locked)
Evaluating the moat and backbone of your AI agents4m 28s
-
(Locked)
Challenges in setting proprietary AI evaluations2m 50s
-
(Locked)
-
-
-
-