From the course: Claude Code: Designing Multi-Model AI Systems
The single model trap: Why AI products fail at scale
From the course: Claude Code: Designing Multi-Model AI Systems
The single model trap: Why AI products fail at scale
Let's build something. I'm going to create a help desk chatbot. One model, one prompt, simplest thing that works, and then we will see where it breaks. We are in Visual Studio Code, and on the left-hand side, we are seeing the GitHub repo. It consists of the diagrams, which includes architectural diagrams, and then it includes prompts, the exact prompts that we will be using to create the code we want, And finally, it has some of the Python code that we will be creating together in the course. So we are building for a fictional product here. It's called Taskflow. Think of Asana or Trello. I picked the helpdesk because it's the perfect stress test. Simple questions, hard questions, off-topic noise, and even people trying to break it. So let's go ahead and open up a terminal here. And before that, make sure you went over the readme and you continued on the setup steps and you completed them all, especially the Anthropic API key. If we don't have the Anthropic API key, we will get an error because our model won't work. So let's go back to the terminal here. And I have saved the prompt for this video in the prompts folder. And every video in this course has one. If your generated code ever looks different from mine you can rerun the prompt and get back on track. So let me grab the prompt we will be using from here under the model trap. Here we are giving information to Claude Code about the project management tool that we are creating. We want to use Claude Sonnet via the Anthropic API. This is the key. We are giving the model that we want to use. Then our product has tiers. It has the free tier, pro tier, enterprise tier and so on. We will also have some color-coded terminal output. We will use green bold for you, cyan bold for task flow responses. So it will be easier to distinguish between who is the chatbot and who is us. Then we will also have the debug flag which is very important. That will print a dim gray line after each response. It will show model name, input tokens, output tokens, latency in seconds, and most importantly estimated cost. So we will see how much each query cost. So let's grab this prompt that we have already built and then let's go back to the terminal here and to activate Claude we are going to go ahead and type Claude enter and this will open up a Claude code instance for us so i'm using both a token and an api key so it's giving me some warning which is okay at this point so we are going to go ahead and copy this prompt here and paste it right here so it's going to do its thing and then it will build a Python code called helpdesk underscore but. It will have the system prompt with all the product knowledge and we will have a very important line in there which is one API call which will call the cloud model that we have identified. So it will ask us do we want to edit this click yes or hit one to have it continue. In a couple of minutes or less, it will finish and then we will see every message our users ever sent will go through this one API call that we will find. Okay, so perfect. We do have the help desk bot here. We will go ahead and open it now. And we will see that our model is Claude Sonnet. and this is what we asked Claude Code to do and then let's go ahead and skim through the functions that it created it created the main function it also has the print banner has all the details maximum tokens and all that so feel free to go through the code in order to test it i'm going to go ahead and open up a new terminal and in this terminal we will go ahead and run it as python src give the file name helpdeskbot.py and then make sure you add the debug flag which we mentioned that it will give us the extra information that will help a lot okay perfect so we are in our new chatbot task flow helpdesk congratulations so why don't we go ahead and give the first question here. Hmm, what should we ask? Well, since we have features and we have different tiers, let's see what features does Taskflow Pro include. And let's see what it says. Okay, let's enlarge this to see a little bit better. And it says that Taskflow Pro is priced at $12 per user per month and it includes the following features. Unlimited projects, Gantt charts, integrations, and so on. That's a pretty good answer. It actually streams very nicely. We also have the debug output right here in gray, which is very important. It has the input tokens, output tokens, and most importantly, the cost right here. That's a real product question. So paying for Sonnet in this case makes total sense. Now watch this. Let's go ahead and try Hello. I made a typo, but it actually figured it out that it was a Hello. Hello, welcome to Taskflow Support. How can I help you? Well, for this, look at how much we paid. We actually paid close to the same amount here for just a simple Hello. same model, same pipeline. At scale, a quarter of our traffic might be actually greetings. People may be greeting our chatbot before they ask a question. So it actually cost us money. Everyone is burning us some cents and Sonnet tokens. It gets worse. Now watch this. What is the weather in Paris today? Okay, there is an answer right here. I'm only able to help with questions related to task flow. So this is outside of what I can do here. Well, for just simple weather question, we didn't get an answer. And on top of that, we paid 0.002184 dollars. And this is the single model trap. Whether it hallucinated or politely declined, full API call either way. That off-topic rejection costs similar as a real answer. No routing, no fast path, no way to say, this doesn't need a model at all. Now, this is the single model trap. It works, yes, we get an answer, until you care about what it costs. So let's go ahead and open the diagram here. So under the model trap diagram, we will see what we have done. So all the questions went through which model, Claude Sonnet model, and then we got a response. This works until we care about what it costs and how fast it responds, or what happens when someone sends something unexpected. The fix isn't a better model. It's a better system and that's what we are building here.