From the course: Vector Databases in Practice: Deep Dive
Create an object collection
- [Instructor] What we need to do next is to create a framework for our data. This is also called creating an object collection in Weaviate. This is a little bit like creating tables in relational databases. Collection definitions are where we'll define the structure of the data, like what properties each collection will contain and their data types, like whether to hold text data, numbers and so on. This is also where we'll define how the database will work with the data. This part is a little bit like choosing a set of tools. For example, we'll define what vectorizer model will be used, which generative model to use for rag applications, and how the database is going to organize or index the data for optimized queries. So when you make these decisions, just like building something in the real world, it's good to think about the end goal so that you can choose the right tools for the job. Vector database can deal with almost any kind of data, but for this course, we'll build a movie database and later on we'll even build a web app on top of it too. Who doesn't like movies, right? As mentioned before, we have a synthetic data set of movies. The data set includes a host of data for each movie, like movie title, brief description, longer synopsis, year and rating, and it even includes critics reviews for our movies. Very handy for avoiding the terrible films even if they're made up. Now let's begin to create our data collection. Seeing how it's a movie database, we'll start with the collection for individual movies, first, we'll need a name. Let's call it movie after individual objects. And then let's define how the data will be vectorized and what generative AI tool to integrate with. In Weaviate this is done through the concept of modules. As mentioned before, we'll use OpenAI here, but it could be any number of different modules. You could also specify specific models, but we'll stick with the default ones for now. Next, we define properties, which are like columns in SQL. Each property needs a name and a data type at a minimum. So we'll add properties for the title and description as text. We'll also save the movie row ID so that we can easily identify them. And the movie here, these are whole integers so we'll use the INT data type. Now I know that the rating data is in decimals, so let's set that as a floating point number, which is called number in Weaviate. The last property is the director data, which is in text. This will let us filter movies by a particular director name, for example. But let's pause to consider. Does the name of the director add much to the meaning of the movie vector? For me, I would want to use this collection for just searching movies by their title and their description. So what we can do is to set Weaviate to skip this property when determining its vector. Commonly you might also do this with text of things like product IDs or URLs. Basically any text that doesn't contain much meaning but might just add noise to your vector, and we'll make sure to close our connection. And that's it, you can run this code to create the collection for movies on your database. Now, of course, we don't yet have any data in this collection, so let's go ahead and do just that.
Contents
-
-
-
-
Create your own database3m 33s
-
Work with Weaviate3m 11s
-
Create an object collection3m 39s
-
Basic data import in Weaviate3m 51s
-
Establishing relationships with references4m 25s
-
Recap: Building a vector database2m 40s
-
Challenge: Add another object collection2m 14s
-
Solution: Add another object collection5m 4s
-
-
-
-