From the course: Introduction to AI-Native Vector Databases
Performing semantic searches using Weaviate
From the course: Introduction to AI-Native Vector Databases
Performing semantic searches using Weaviate
Now that we have data in our vector database, let's see what's actually stored in the database. Let's pull out one data object and inspect the question and answer. So, just to summarize, we're going to start off with where we left off in the last video. So, here we're loading in our data, printing out one object. We're instantiating with Weaviate. It's ready to go. We can make sure that it's ready to go by looking at the metadata. Everything is good here. Again, we don't need to understand what this is as long as this returns something. This is just a quick sanity check that Weaviate is up and running. We're going to go ahead and create the question class as we did last time, set it up, and we're going to add our data to deviate one by one. So, we've got all ten questions and answers in there. We can go ahead and make sure that all ten question answers are in there by doing this quick count. And then we can go ahead and extract out three of those questions as we talked about last time. So, now, what we want to do is look at one particular question and answer and look at the vector representation behind that. So, let's write a query that's going to help us extract one data point and it's going to get us the vector for that data point as well. So, in order to do that, we're going to go in, we're going to have a variable that's going to store the result. And we're going to say, client, I have a query for you client dot query. And we're going to specify which class this data point lies in. We're going to specify what pieces of information we want to extract from our database. So, remember, we have the category. We have the question, and the answer properties. So, we're going to extract all three of these data points this time. So, we've got all three. So, now, to specify that we want the vector as well, we're going to go in here, and we're going to say with additional we're going to specify here that we want the vector for our data point as well. We're going to specify vector. And then to specify that we only want one data point, again, we're going to use this with limit component of the query, and we're going to limit it to only one data point. And we're going to perform this query. To format it a little bit more nicely, we're going to wrap this in brackets, and then we can go ahead and make it a little bit more readable here. So, we can do something like this. Like so, so that we can read it up and down the page. And this should be good to go. So, if we run this query, the results should be stored in our variable. We can go in, and then extract the particular question first. Before we look at the vector, let's have a look at what question it is that we're extracting. And this is the question that we've extracted. Let's have a look at the answer for this. Right. So, the question is weighing around a ton. The eland is the largest species of this animal in Africa. And the answer for this question can also be extracted out, and the answer is antelope. So, now, the question is, what is the vector affiliated with this question-answer category pair? To extract out the vector, we can go in and the vector is also in the result here. So, we can go in, and we can say result at the data key for here specify the class that we're looking at. The index of the data point that we're interested in. It's an additional property and it's underneath the vector. So, here, if we run this, this is the group of numbers that captures the meaning behind this question and answer. And this is a very, very long list of numbers. And this captures the essence of machine-understandable data. So, this is human understandable data. You and I can read the question, understand the question, read the answer and understand it. The computer understands the question and answer using this long list of numbers. And so, if we want to know how long this vector is, what is the dimensionality of this vector, we can go ahead and ask the computer to measure this dimensionality. So, we can go in, and we can say something like the length of this vector that we're examining. And then we want to copy-paste this in from here. And this is going to do two things. Firstly, it's going to take that vector and assign it to the vector variable. And then it's also going to pass that vector variable into the length function so that we can find out how long that vector is. So, in this particular case, the vector is 1536 dimensional. And that's the number of numbers that we have in this vector representation up here. And this is the machine understandable version of our data, which is pretty awesome. So, we've done a couple of things here. We've extracted one question. We've looked at it. We've looked at the corresponding answer. And we visualized and looked at what the vector representation of that question is, and how long that is. And it is quite large here. 1536 dimensional vector here. So, now, what we want to do is see if we can search for questions relevant to a particular concept. So, here, we want to create a query that's going to help us search for biology-related questions. So, here this is where we perform semantic search. We want to extract questions that are conceptually or semantically close to the concept of biology. We can go in, and write a query to do this. And we're going to format it nicely. So, we're going to sandwich it in between our parentheses here. And we're going to say, client, we, again, have a query for you. And we want you to get information from a particular class. So, the class here is called question. And the properties that we're interested in displaying are all of them. So, we want you to extract the question. We want you to extract and display the answer and the category. And then here this is the secret sauce. How we specify what concept the question should be related to is by using this with near text query. So, there's a bunch of queries that we can run. The one that we're interested in here is with near text. Because we're passing in a query as text, we want the nearest text to that. So, here we're going to pass in an object where we say the concept that we want to be close to our biology. So, this is our concept that we want to extract questions close to. And here we'll limit it to two questions. So, we'll again use the with limit component of the query to say only give me the two most related concepts. And then, we tell it to perform this operation. That should be good. So, we're going to take that response. And again, we're going to pass it into our nicely printed version here so that we'll be able to see what comes back. So, the query ran successfully. We've got two questions answers that came back. We can see them here. And it gave back the answer category in question because we asked it to return all three properties here. And notice how the concept here is biology, and the questions and answers that we're getting back are related to biology. The first one here has to do with DNA and Watson and Crick and their experiments. And the second one has to do with species. So, it's related to biology. And this is the main idea behind vector search. We're searching using concepts and it's taking those concepts into account, and it's returning similar concepts. So, then, if we want to start and quantify how close is this first object to our query of biology, we can do that as well. We can do the same thing for the second object. So, if we want to measure the distance between our query and every object that we've got, we're going to do that next. In order to do this, we can simply take the query that we've constructed up here, and we're going to modify it slightly. So, there's not a very big change in this query. We're going to copy-paste that query in. And the only additional thing that I'm going to do here is I'm going to specify in my query that I want you to do one more thing for me. I want you to pass in and return an additional piece of information for me, which is the distance. And this is going to be used to return the measured distance between my query, which is biology, and all of the objects that are returned. So, now, this looks good. We can run this query. And notice now how we get back the same questions and answers. Because they are the closest to our concept of biology. But we get this additional property here as well. We get the distance between the query and our object. In this case, the object that has to do with DNA. And in this case, the question answer that has to do with species. And this distance tells us how close or how far away the object is from the query. So, in this particular case, the closest object to our query is one that has a distance of 0.19. And this one is slightly farther away with 0.20. And so the model that the vector database is using is quantifying that this object is slightly more relevant to the concept of biology than this object. We can even go one step further and we can say, I want you to print out and return everything and tell me the distance between my query object, which, in this case, we can set to be animals, and we can see how relevant or how irrelevant a data point or an object is to my query. And this will kind of clarify for us the significance of that distance. So, here, if we take that query, I changed the concept up into animals, and I set a limit to ten. Remember, we only have ten objects in the database. So, it's going to return everything for us. So, we get back all ten objects here. And what I want you to notice is what's happening to the distance here. So, that as I scroll down, the distance is going up, up and up. And so what I want you to do now is if I scroll up. Notice that the query is animals. The closest object to animals is this question and answer. It has to do with an elephant, and the distance is 0.18. The second closest is 0.19. It has to do with a crocodile. So, there's this concept of a crocodile in here. The third one has to do with an antelope, and then you're getting further away. You're talking about species. You're talking about poisonous snakes here. Then you're talking about DNA. And then, notice how, as the distance increases, you start talking about irrelevant concepts, increasingly more irrelevant concepts. So, here you're talking about a wire, here you're talking about the liver. You're talking about the sound barrier. And so, what this means is that as the distance between our query vector and the object vector increases, we hit this point where, after a certain distance has passed, we're getting irrelevant responses. So, a wire has nothing to do with our input query of animals. So, what we can actually do is if we notice this pattern in our return results, we can tell the database to threshold and only return results that are relevant up to a certain distance threshold. So, how do we determine that distance threshold? Well, if we examine our returned results, we can see that. All of these concepts are relevant, so a distance of 0.19 is relevant, 0.2 is relevant, 0.21 is relevant, 0.23 is relevant, 0.24. This is where we start talking about DNA and the wire. So, maybe the cutoff is that the distance should be lower than 0.24, but shouldn't be higher than that. Right. So, maybe that can be our threshold to signify to the vector database that that is the cutoff of significance. So, we can go in here and specify that so that we only get relevant results. So, we're going to go in here and we're going to set a max distance. We're going to say max distance is equal to 0.24, as we just examined. And now, we're going to take the exact same query that we had earlier. So, we're not changing much. I'm just going to copy-paste this exact same query. And we're going to modify it slightly to the vector database no to threshold using this max distance. And the way that we're going to do that is over here. So, within the concepts, we're also going to specify that we want a distance here. And we want this distance to be at most as high as max distance. And then we can go in here and we can print out this particular response the same way as we've been printing out our responses. So, we can take this indent it so that we can visualize it nicely. So, when I run this now, I only get responses that have a distance less than 0.24. So, all of these are relevant. And then, as soon as I hit that threshold of 0.24, I don't get anything back. And so, now, we've told our vector database a distance that signifies relevance to the vector query, and it's going to cut it off at that point and not give anything back. And notice how even though we've asked for ten objects, it's going to limit it to objects that are at most a distance of 0.24 away from our query vector. And so, now, we only get back data points that are relevant to our query. Now that we've got a good intuitive understanding of vector search and practical experience working with Weaviate, in the next challenge, we'll get you to add a larger data set to Weaviate, and perform vector search.
Contents
-
-
-
-
(Locked)
Frame the query as a question or search1m 56s
-
(Locked)
Generate the question in machine-understandable language1m 22s
-
Adding data to a vector database9m 48s
-
Performing semantic searches using Weaviate13m 36s
-
(Locked)
Challenge: Vector search with Weaviate49s
-
Solution: Vector Search with Weaviate11m 5s
-
(Locked)
-
-
-
-