From the course: OpenAI API for Python Developers
Create a vector store and embeddings (Chroma)
- [Instructor] Now after loading and splitting the documents into chunks, we want to create embeddings and load them into the vector store. So in order to be able to create embeddings, you got to make sure that you add to the scope. So I've added here, line 12, the open AI embeddings language model. And right here I have created an instance. So let's go back. We're going to find here the next part of the example, so that is the same basic example that we use. And so the next step will be to create and load embeddings to the Chroma vector store. And we're going to take from this line all the way down to this line. We're going to copy right here, and go back to our example like this. We just need to make a few adjustments. And here basically, so that's going to correspond to the documents, so the chunks of documents. And for the embedding, so let me replace the names. So I'm going to replace with embeddings. I'm going to stay simple. That's just going to be named embeddings, all right? And after that, we're going to allow to search by similarity, bypassing, so that's not going to be this example. We're going to pass here are user input, which is to know if we're going to ask the AI assistant, "Do you ship to Europe?" So that's going to be our query. So let's replace with, maybe we can name it user query. I'm going to rename it. I'm going to rename that to user query. And so we're going to submit this user query to the language model and allow the language model to search by similarity to retrieve relevant information. So let's try it. Python. And you can see that it's going to go through, again, the process of splitting the documents into chunks. And this time it's going to load the documents, the chunks of documents, to the Chroma DB vector store. And then we search by similarity based on the text input, which is to ask, "Do you ship to Europe?" And so the answer will be yes, and shipping costs may vary depending on the shipping destination. And we're going to see that it corresponds exactly to this question relating to shipping information, a question about shipping. And we're going to get the exact matching response based on the text inputs, so here we were able by loading the embeddings to find and retrieve the exact and relevant information based on the text input. And you can also search by similarity by vector. So you can find the example right below, which is to search by similarity, but also by vector. In this example, on this page, you can find one example, which is to search by similarity. One first example with this, just like we've seen, or you can also allow to search by vector, because what we load to the vector store database are vector IDs. And you're going to see here with the schema that what we do is to load, split the documents, because the process is that we load and split the documents into chunks that are then converted into vector IDs. And so the vector IDs are going to be used to then search by similarity whenever we make a query.