From the course: LLM Foundations: Vector Databases for Caching and Retrieval Augmented Generation (RAG)

Answer questions with RAG

With the knowledge base setup, let's now try to answer some queries with RAG. We first need to set up search parameters for the vector search. We will use the same metric L2. We will be using a radius of 0.5 to find matches in the database. The query we will use is "What is gender bias?" The query needs to be first converted to its embedding representation. Then we initiate a collection object and load up the collection. We get to the search now. We set up the search to return the top three answers. Do note that depending on the chunk size, your answers may be spread across multiple chunks. So we need to get enough chunks to get a complete answer. This number may need to be adjusted based on the specific use case. Then we print the top result to see how well it matches the question. Let's now run the retrieval process. The top result has a distance of 0.2, and it does answer the gender bias question. Next, we send the data to the LLM to get a concise answer. We first concatenate all the returned results into a single context list. Then we create a prompt where we provide a system prompt to the LLM that asks it to answer the query only based on the context. We append the context and the query to the prompt. We then print the final prompt that will be sent to the LLM. Let's prepare the prompt now. Finally, we send the prompt to the OpenAI LLM and get the response. The response is then printed. Let's get the response now. As seen from the response, it does answer the original user question about gender bias. This completes our example for building a RAG system with a Milvus vector database.

Contents