From the course: Agentic AI: Build Your First Agentic AI System
Unlock this course with a free trial
Join today to access over 25,600 courses taught by industry experts.
Error analysis and metric design
From the course: Agentic AI: Build Your First Agentic AI System
Error analysis and metric design
Now, we've verified that our agent works for a single test case. But as we've learned in the previous video, we want to build an evaluation dataset and make sure that it works for more test cases, and we're ready to go to production, and that's what the continuous calibration phase is all about. In order to enable us to do this, we'll use Arise Phoenix like we did in the previous chapter. The setup is pretty much the same. We'll have Phoenix ready for us on 6006, which is a web port for us to check. Again, since we've not run any experiments, you will see this basic dashboard. We can set it up so that our traces start flowing in. I've set up about 22 test cases for the planning agent, and these test cases are supposed to be testing all of the edge cases and also complex user queries that come in. You can check them out here. Again, remember that a lot of this should be set up in the real world using historical data or subject matter experts so that you can get an accurate…