From the course: LLM Foundations: Vector Databases for Caching and Retrieval Augmented Generation (RAG)
Collections in Milvus
From the course: LLM Foundations: Vector Databases for Caching and Retrieval Augmented Generation (RAG)
Collections in Milvus
We will now discuss some key storage concepts in Milvus. We begin with databases and collections in this video. What are databases in Milvus? A database is used to physically group similar entities and data. This organization is similar to other databases where the concept of a database, a tablespace, or a schema is used to group similar objects. Each Milvus instance can manage multiple databases. A single instance can have up to 64 databases. The default database in Milvus is called default. It is automatically created. If a new entity is created without specifying a database name, it is stored in the default database. A database serves as a container for data. It will store collections, partitions, and indexes within it. In Milvus, access control is implemented by database. Users can be created and configured at a database level. Roles can also be created for each database with specified permissions and then assigned to users. Databases help to support multitenancy in Milvus. Each tenant can be provided with their own database, and data belonging to that tenant can be stored there. This provides the highest level of tenant isolation within a Milvus database. We then move on to Milvus collections. A Milvus collection is like a table that we find in traditional databases. It is the logical entry that is used to store and manage data. Each collection has a defined schema. A collection is created by providing a schema. Schema can also be modified with certain restrictions. Each field in a schema is similar to a column in a table. A field has its own datatype, size, and default values similar to database columns. Milvus supports several datatypes that cover both scalar and vector datatypes. A given collection can have a combination of scalar and vector fields. One of the scalar fields can be set as the primary key for the collection. This field cannot have duplicate values. These keys can also be auto generated if needed. This is defined in the schema. Milvus also supports dynamic fields that allow ad hoc fields to be added during data inserts. What are the datatypes supported in Milvus? On the scalar side, it supports integers with 8, 16, 32, and 64 sizes. Float and double are also supported for floating point storage. VARCHAR and Boolean are available to store string and Boolean values respectively. Milvus also supports complex datatypes like JSON and array to store data in these formats. On the vector datatype side, a binary vector is used to store vectors with just binary values. Float vector is used to store floating point values, which is what most embeddings are.