Error analysis and metric design

From the course: Agentic AI: Build Your First Agentic AI System

Start my 1-month free trial Buy for my team

Error analysis and metric design

��

Now, we've verified that our agent works for a single test case. But as we've learned in the previous video, we want to build an evaluation dataset and make sure that it works for more test cases, and we're ready to go to production, and that's what the continuous calibration phase is all about. In order to enable us to do this, we'll use Arise Phoenix like we did in the previous chapter. The setup is pretty much the same. We'll have Phoenix ready for us on 6006, which is a web port for us to check. Again, since we've not run any experiments, you will see this basic dashboard. We can set it up so that our traces start flowing in. I've set up about 22 test cases for the planning agent, and these test cases are supposed to be testing all of the edge cases and also complex user queries that come in. You can check them out here. Again, remember that a lot of this should be set up in the real world using historical data or subject matter experts so that you can get an accurate…

Unlock this course with a free trial

Join today to access over 25,600 courses taught by industry experts.

Error analysis and metric design

From the course: Agentic AI: Build Your First Agentic AI System

Error analysis and metric design

Download courses and learn on the go

Contents

Start learning today.

Explore Business Topics

Explore Creative Topics

Explore Technology Topics