From the course: LLM Foundations: Vector Databases for Caching and Retrieval Augmented Generation (RAG)
Combine vector and scalar data
From the course: LLM Foundations: Vector Databases for Caching and Retrieval Augmented Generation (RAG)
Combine vector and scalar data
Any use case that requires vector support may also need scalar data to be used with it. Scalar data is needed to implement additional filters for authentication, multitenancy, or context. Do we keep scalar and vector data together or separate them and stored them in databases that are built for their purposes? For example, do we simply use Milvus for both datatypes, or do we use a MySQL instance for scalar data and link them? What do specialized databases bring to the table? They have excellent support for vector search. This includes specialized data storage, support for embeddings, metric types, and optimized plans for vector search. On the other hand, they lack extensive query capabilities that traditional databases provide, like joints, aggregations, and functions. So what criteria do we use to determine if we want to keep scalar and vector data together, or have specialized databases for each of them? The first one is whether we need hybrid searches that combine scalar and vector data. What kinds of search conditions will be used in your use case? Will they require aggregations and summaries? For simple hybrid searches, we can go with just vector databases. For more complex ones, we need to store the datatypes separately and implement multi-step searches. Should we keep scalar and vector data in separate databases? Will keeping them together fulfill the requirements for the queries? We need to analyze the use case to determine this. Choose the strategy carefully, as there are significant long-term implications whichever way we go.