From the course: Foundations of AI and Machine Learning for Java Developers
Different ways of using LLMs - Java Tutorial
From the course: Foundations of AI and Machine Learning for Java Developers
Different ways of using LLMs
- [Instructor] If there are four ways of using a large language model. We can consider them to be in two different categories. One category denoted here involves only using text strings as we have seen with prompt engineering. This category does not modify the weights in the underlying neural network. We are just dealing with strings. So typically, no data science expertise is required. But of course, understanding the underpinnings of machine learning helps, but it's not necessary. You are sending prompts with context to the underlying model. A retrieval-augmented generation, or RAG, system is effectively a prompt technique as we have seen before. Except here, the context is retrieved from an external data store. RAG systems are very popular and surprisingly effective. Known for large deployments with many users, a substantial amount of text flows between the application and the language model, which can be quite costly. The other category denoted here, does require you to have data science expertise and substantial financial means. Here you're actually changing the weights of the underlying neural network, so you really have to understand data science. Creating a foundation model is an extremely expensive task. Only a few major companies on the planet have the resources to train a foundation model. A fine-tuned model builds upon a foundation model. You may want to create a fine-tuned model that is specifically trained in a certain domain. For example, financial information, healthcare, software development, et cetera. The advantage of these models is that you don't constantly duplicate prompt context or behavior for all your users. Much of that information can be baked into the model. Let's look at the four basic ways of talking to a model. We've already seen this way of using a model. Prompt techniques or prompt engineering is used to craft a prompt. You add some context, and then you send that along to the model. Basically, from a programming point of view, it's string manipulation. Another way of talking to a model is the leftmost arrow here. Like the rightmost arrow, you're using prompt techniques to send prompts to a fine-tuned model, which uses a foundation model. The third way is a very popular approach, retrieval-augmented generation, or RAG. If you remember from our discussion of context, the more appropriate context that is added to the prompt, the better we can guide the model to give us a useful response. In a previous video, we manually copied and pasted context to a prompt to get a good completion. RAG is a similar concept, except we now use an external data store to determine context. The last way is to use prompt engineering and RAG to talk to a fine-tuned model which uses a foundation model to converse with us, just like a human conversation. While prompt engineering techniques are very popular, particularly RAG systems, there are specific use cases for creating fine-tuned models. You may want to start with simple prompt engineering first, then try a RAG system before you decide whether you want to create a fine-tuned model.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.