From the course: Azure for Developers: Retrieval-Augmented Generation (RAG) with Azure AI
Fundamental concepts of RAG - Azure AI Services Tutorial
From the course: Azure for Developers: Retrieval-Augmented Generation (RAG) with Azure AI
Fundamental concepts of RAG
- [Instructor] Retrieval-augmented generation or RAG is the technique of adding data to a language model from an external data source. Since language models are trained on a large set of data, mainly from the internet, they have no knowledge of your private corporate data, such as legal contracts, product manuals, customer information sheets, and software code. RAG solves this problem allowing you to get insights from your private data with the language model. Think of RAG as an open-book exam. You bring all relevant documents to help answer questions without needing to memorize everything. You are unable to answer the questions once the exam is done and no longer have access to these documents. It is the same principle as RAG. Answers to the questions come from external documents you attach to the RAG solution. The system is unable to answer the same questions when the documents are detached since they're not stored in the model's internal memory. As we develop RAG solutions, it is also important to understand the concept of tokens and embeddings as they help you understand how language models and RAG solutions work. Tokens play a huge part in RAG because language models like OpenAI are priced based on token usage. Whenever you provide a language model, a query or a prompt in natural language, the system performs a process called tokenization to convert your words into numbers called tokens since computers understand numbers not words. The OpenAI tokenizer website is a very good guide. It shows you the token and character count for each word that you enter. It also shows you the token IDs assigned for each word. The tokens are then converted into vectors and mapped into a vector space so that the model can learn the relationships between words and their meanings. This process in turn is called embedding. Imagine a 3D space with vectors for words like dog, cat, bark, meow, and skateboard. In this space, dog and cat are close because they're animals and bark and meow are nearby because they're animal sounds. But skateboard is far from all of them as it's unrelated. This is a basic sample of embeddings in three dimensions. In this course, we'll use the Azure OpenAI embeddings model, which ranges from hundreds to even thousands of dimensions. We will use this model to create the embeddings of our company data. Once you enter a prompt into a RAG system, it converts your prompts into tokens and vector embeddings and uses those to find relationships with your company data, allowing it to generate a coherent response. Now that you understand the concepts of RAG, tokens, and embeddings, we can now discuss how RAG works and its architecture.