From the course: LLM Foundations: Vector Databases for Caching and Retrieval Augmented Generation (RAG)

Insert data into Milvus

Having created a collection, let's now insert some data into Milvus. In this video, we will discuss the preprocessing and the insertion steps needed. For this demonstration, we have a CSV file called course-descriptions.csv. This contains a list of five courses. The columns are course ID, title, and description. We will insert this data into Milvus, and we will also use embeddings for the description column for future search operations. Back to the notebook. First, we load up the CSV file into a DataFrame using pandas and then display the top records. Running this code, we can see that it is properly loaded. In order to create embeddings, we will use OpenAI's sentence embedding model. This returns an embedding with a dimension size of 1536. We will set up the API key for OpenAI. It is recommended for you to get your own API key for OpenAI and use it in the code. We will initialize the OpenAI embeddings function from LangChain for getting embeddings. If you use a free trial account, you may run into rate limits too. It is recommended to use a paid account to run these exercises. Let's run this code now. In order to insert data into Milvus, we need to convert each column in the DataFrame into a list. Do note that this is column by column, not by row. We do that for course ID, title, and description columns. Then we create the list of embeddings for the descriptions by using the embed_query function. We finally form a list of all the column lists in the same sequence as the corresponding fields in the collection schema. To insert, we first initiate a collection object by passing the collection name and the connection ID. Alternatively, we can also use the same collection object that we created when creating the collection earlier. To insert data, we call the insert function with the input list. When this insert statement is executed, the new data items are only loaded to the processing queue. Milvus will asynchronously process them and insert them into the collection. This may take some time until the pending record size reaches a specific size. To force this processing, we will call the flush function to immediately insert the records. Flush may take some time to run depending upon the size of the pending data. Let's run this code now. The data is now inserted into the collection.

Contents