From the course: Agentic AI and Autonomous Development
Course overview
From the course: Agentic AI and Autonomous Development
Course overview
- [Lecturer] Welcome to the course, Agentic AI and Autonomous Development. We're gonna talk about how you can build solutions in 90 minutes or less that are from zero to production. Some of the ideas around this course involve getting to the core principles, also looking at the math and the latest scientific research on agents. Let's go ahead and talk about the problem here to start with. One of the real problems with using these large language models is that you could develop a very expensive habit of using things that have extreme API costs. For example, if you're fully saturating an API, maybe you're approaching close to $10,000 an hour because you have lots of different agents running. And these agents are churning and putting huge context into whatever it is you're calling. And you also have a problem with determinism because you're using such a large model, a lot of the things you're doing don't work, so you have to retry over and over again, and you have, in many ways an infinite failure scenario. If we look at the Actor Model, this is a great way to recompose a problem, reframe the problem, think about things from what we've already done in a traditional software engineering best practice. Some of the things you can do with the Actor Model that came from Ericsson and Erlang is to use things that have supervision, and then have extremely high throughput and zero shared memory. If we look at a good example of what a success looked like, WhatsApp was built originally with Erlang. 50 engineers developed a system that could handle 2 billion users. So we see that the architecture, the choices, the design where concurrency was a big role, was a huge factor in the success of WhatsApp. If we look at some of the best practices for modern software engineering with agents, one of the things that we need to control for is determinism. We know that agents that are larger especially are less deterministic, because they're trying to solve so many different problems, they hallucinate. One of the ways to gate that is to have a quality gate that looks at the technical defect gradient of the code. So you score the code and you say, "This particular file is an A file, or it's a B file, or it's an F, it's failed. We need to fix things." Another thing to look at is the complexity, so how many conditional statements are in your code. You should be looking at building things with a quality gate where complexity doesn't exceed 10. Also, what about self-admitted technical defects? This is another anti-pattern that's very easy to happen when you're billing things with large language models is the agent is trying to finish. It's in a hurry. It puts in a to-do statement. Before you know it, you have a hundred or a thousand to-do statements in your production system. All those to-do statements are defects. We need to prevent any of those self-admitted technical defects from forming. Also, it's important to think about the concept of zero tolerance for defects. The Toyota Way system is one example of that in the automobile industry, where they've developed systems that prevent defects. We look at things like Six Sigma as well, where you're looking at ways to prevent the systemic workflows that create defects. If we look at things like cargo tests, for example, you could make sure that things are gated based on the test quality, or you can look at PMAT and look at a quality gate and say, "Look, if these particular measurements are not in place, we're not gonna be able to release the software." The complexity, the self-admitted technical debt, the churn, the code coverage, these are ways to enforce zero tolerance for defects, and this is a very different way to think about developing software. Instead of thinking about I've developed 1,000 lines of code, or 2,000 lines of code, or 10,000 lines of code, or I developed lots of new features, instead we wanna develop zero defects. So, this is a different framing of the problem so that you don't create defects. In this course, we also talk about large language models versus the small language models. There's a lot of research that compares the 1% cost versus a large language model. And if the equal results are getting produced from the large language model and the small language model, why would you pay a hundred times more? So, this becomes a huge factor as you start to build out the different solutions is if you can get better performance in terms of the inference and then equal performance in terms of the accuracy, the smallest language model becomes a much more compelling offering. The big takeaway is that small models win. They're 10 to 30 times faster for inference. They're 1% of the cost. And you can deploy on device, which starts to solve a very significant problem in terms of sovereignty. Many regions of the world don't wanna be sending their data to other regions. You also potentially are concerned with the laws, and you don't want to be sending data somewhere where they could maybe treat things in a different way than is legally required by organization. And so, the small language models solve security. They solve inference speed, and they also solve a cost problem. Now, in terms of when you're something with, let's say, lots of small language models, if we look at the architecture of a distributed system like Erlang and they have a supervision tree, the idea here is that you let things crash, because you have a supervision that is allowing you to easily start things up again. So, this is common if you have a bad input for a small language model, you just restart it and things are back to normal. If we look at the cost target, this is another interesting way to think about building solutions. Instead of only building solutions for the accuracy or only building solutions for the largest model you can get, a different way to think about building solutions is how do I low... Start again Another way to think about this is to say, "How do I lower the cost so it's almost zero, so that I can have a cost target that makes this essentially a fixed cost? So, I can run things like I would run a server or I would run things like my SQL query." If you start to think about things from a fixed cost perspective, suddenly things become much more compelling. Another emerging thing that we're seeing with the agentic coding systems is the idea of a subagent. The subagent is one agent and one task, an isolated state and parallel execution. If you notice, this is the identical pattern of a distributed system with a supervisor. And this is why supervisors and all of the history of supervision trees and the high performance systems start to really fit in terms of making a pattern match. Another pattern we talk about in this course is Amdahl's Law. Amdahl's Law is really important because if you extrapolate it towards agents and sub-agents, you can see that human attention becomes the bottleneck, not just in the case of Disc I/O or or Network I/O in the case of Amdahl's Law. In the case of Amdahl's Law for agents, human attention becomes the bottleneck because a human can only review so many different things if there's three agents or six agents. Maybe that's the capacity of a human. And if you think about things from that perspective, it makes it much more important to have quality over quantity. Maybe six really good agents are better than 20 really noisy agents that a human can't pay attention to. In terms of simple versus complex, this is part of the Floridi's conjecture is that the more complex a system gets, the more you have to limit the scope. If you want something simple, you're also gonna have an issue in terms of what are you gonna do to make that simple system more complex? One of the ways that we think about this is that the simple systems will outperform the complex systems, because you can put many different simple systems together. The complex system, because it's trying to do so many different things, is gonna have a lower accuracy, and this lower accuracy becomes a problem over time. We know a calculator, for example, does one thing and does it well, and that's got a hundred percent accuracy. If you do the same thing with let's say GPT-4, it's got 60% accuracy. Some of the things to think about when building production systems is this formula of can we build something quickly. Well, if we use actors, that's one way to make a high performance system. If we enforce quality from the very beginning, using tools like PMAT or Toyota Way. If we use smaller language models, we also are going to be able to deploy things more quickly and more cost effectively. And then finally, if we use the overarching architecture for software engineering, that's worked well in the past, supervision trees, this is a great way to get things done. So finally, the result, if you put all these things together and these are some of the concepts we'll cover in the course, is you can get a low cost and deterministic solution. Maybe not exactly 100% deterministic, but approaching 100%, especially if the task has been narrowed, so that a small language model can tackle it. All right, we have a lot to cover in this course. Let's go ahead and get started.