From the course: Beyond the Basics: Building Generative AI Java Applications Using LangChain4j
Searching an embedding space
From the course: Beyond the Basics: Building Generative AI Java Applications Using LangChain4j
Searching an embedding space
- [Instructor] We're now going to make our RAG system actually find things. Let's use the low-level LangChain4j APIs in this video so you'll understand the details of a search. In the next chapter, we'll discuss how to search using the high-level LangChain4j APIs. We want to search the embedding store to find similar strings to add to our prompt. Here are the steps. Get the query from the user and get the embedding for that string. Now, remember to use the same embedding model you use for your embedding store, since you cannot mix embedding models. Now, let's search the embedding store. First, we need a search request, so we need to create an embedding search request instance. Using the embedding you got for the user's query, create an instance of embedding search request. As part of that request, identify the maximum number of search results you want in the minimum score. These parameters will help to find the most similar results, and you need to experiment to find the best values. You then search the embedding store with that request. Then you use the results. For each match, you can further search by using the metadata for each match. Collect the best text segments. Extract the text strings from those text segments, combine those strings with the user's original query string, and then send that to the language model. Here's a diagram of the same process showing the LangChain4j's abstractions for storing and retrieving strings from the embedding store. In the upper left-hand corner, we have an embedding model, which converts text into embedding vectors. Now, remember to use the same embedding model for ingestion and for queries. If you change embedding models anytime during your application, the embedding vectors won't align, and similarity calculations won't work correctly. Changing the embedding model may require re-ingesting all the data into your vector database, which could be a considerable task. The embedding store in the middle is our vector database layer. This LangChain4j abstraction makes it relatively easy to swap implementations. For example, you use an in-memory embedding store for demos and then switch to a commercial product like Milvus, Pinecone, PG Vector, Qdrant, or even MongoDB or Cassandra Variance. You don't rewrite your retrieval code. You simply change the store implementation with the builder. And to search the embedding store to find similar strings, you need to use the same embedding model during ingestion to find the embedding vector for the user's query. You pass that embedding to an embedding search request with search parameters, and then you retrieve the result. So let's see this in action. Here I have a Java class embedding in memory. And inside my main method, the first thing that I do is that I create an embedding store. In this example, I'm using in-memory embedding store. Note the in-memory embedding store is not the fastest embedding store, but it works for our demo. I then create an instance of an embedding model, and here are my examples that I'm using Open AI's embedding model. Then I load the embeddings from a certain file, and my file is amateurastronomy.text, and that file is just full of information about amateur astronomy. It's quite a long file. So I load those embeddings into my embedding store, and then I enter this little loop asking for queries from the user, and make sure the query is not empty. Then I take the user's query, I find the embedding vector for that. I take that embedding, I add that to my search requests, and I also add, I only want at maximum of 10 results, and my results have to be better than 0.7. So ignore anything below 0.7. Now these values are just for my example. Then I take that search request and I send that to the embedding store to find matches, and then I display my matches. So let's run this program. Okay, so let's enter a query, and right away my program found a match, a semantic match when I typed in solar eclipse. So this string is similar to what the user asked. Just try something else. I asked for Pink Floyd, and there were no matches found. Here's my third example, stellar phenomenon, and my program found several matches. Looks like about five or six matches, and notice that the number is above 0.7. Here's 0.738, 0.726, 0.719. So it found matches that were above 0.7. So this chapter showed the low-level APIs for full control and clarity. Now in the next chapter, you're gonna see high-level abstractions that hide some of this boilerplate and integrate retrieval directly into your service layer.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.