RAG

get-convex/rag

View repo

View package

✨ Key Features#

Add Content: Add or replace content with text chunks and embeddings.
Semantic Search: Vector-based search using configurable embedding models
Namespaces: Organize content into namespaces for per-user search.
Custom Filtering: Filter content with custom indexed fields.
Importance Weighting: Weight content by providing a 0 to 1 "importance".
Chunk Context: Get surrounding chunks for better context.
Graceful Migrations: Migrate content or whole namespaces without disruption.

Found a bug? Feature request? File it here.

Installation#

Create a convex.config.ts file in your app's convex/ folder and install the component by calling use:

// convex/convex.config.ts
import { defineApp } from "convex/server";
import rag from "@convex-dev/rag/convex.config.js";

const app = defineApp();
app.use(rag);

export default app;

Run npx convex codegen if npx convex dev isn't already running.

Basic Setup#

// convex/example.ts
import { components } from "./_generated/api";
import { RAG } from "@convex-dev/rag";
// Any AI SDK model that supports embeddings will work.
import { openai } from "@ai-sdk/openai";

const rag = new RAG(components.rag, {
  textEmbeddingModel: openai.embedding("text-embedding-3-small"),
  embeddingDimension: 1536, // Needs to match your embedding model
});

Add context to RAG#

Add content with text chunks. Each call to add will create a new entry. It will embed the chunks automatically if you don't provide them.

export const add = action({
  args: { text: v.string() },
  handler: async (ctx, { text }) => {
    // Add the text to a namespace shared by all users.
    await rag.add(ctx, {
      namespace: "all-users",
      text,
    });
  },
});

See below for how to chunk the text yourself or add content asynchronously, e.g. to handle large files.

Semantic Search#

Search across content with vector similarity

text is a string with the full content of the results, for convenience. It is in order of the entries, with titles at each entry boundary, and separators between non-sequential chunks. See below for more details.
results is an array of matching chunks with scores and more metadata.
entries is an array of the entries that matched the query. Each result has a entryId referencing one of these source entries.
usage contains embedding token usage information. Will be { tokens: 0 } if no embedding was performed (e.g. when passing pre-computed embeddings).

export const search = action({
  args: {
    query: v.string(),
  },
  handler: async (ctx, args) => {
    const { results, text, entries, usage } = await rag.search(ctx, {
      namespace: "global",
      query: args.query,
      limit: 10,
      vectorScoreThreshold: 0.5, // Only return results with a score >= 0.5
    });

    return { results, text, entries, usage };
  },
});

Generate a response based on RAG context#

Once you have searched for the context, you can use it with an LLM.

Generally you'll already be using something to make LLM requests, e.g. the Agent Component, which tracks the message history for you. See the Agent Component docs for more details on doing RAG with the Agent Component.

However, if you just want a one-off response, you can use the generateText function as a convenience.

This will automatically search for relevant entries and use them as context for the LLM, using default formatting.

The arguments to generateText are compatible with all arguments to generateText from the AI SDK.

export const askQuestion = action({
  args: {
    prompt: v.string(),
  },
  handler: async (ctx, args) => {
    const userId = await getAuthUserId(ctx);
    const { text, context } = await rag.generateText(ctx, {
      search: { namespace: userId, limit: 10 },
      prompt: args.prompt,
      model: openai.chat("gpt-4o-mini"),
    });
    return { answer: text, context };
  },

Note: You can specify any of the search options available on rag.search.

Filtered Search#

You can provide filters when adding content and use them to search. To do this, you'll need to give the RAG component a list of the filter names. You can optionally provide a type parameter for type safety (no runtime validation).

Note: these filters can be OR'd together when searching. In order to get an AND, you provide a filter with a more complex value, such as categoryAndType below.

// convex/example.ts
import { components } from "./_generated/api";
import { RAG } from "@convex-dev/rag";
// Any AI SDK model that supports embeddings will work.
import { openai } from "@ai-sdk/openai";

// Optional: Add type safety to your filters.
type FilterTypes = {
  category: string;
  contentType: string;
  categoryAndType: { category: string; contentType: string };
};

const rag = new RAG<FilterTypes>(components.rag, {
  textEmbeddingModel: openai.embedding("text-embedding-3-small"),
  embeddingDimension: 1536, // Needs to match your embedding model
  filterNames: ["category", "contentType", "categoryAndType"],
});

Adding content with filters:

await rag.add(ctx, {
  namespace: "global",
  text,
  filterValues: [
    { name: "category", value: "news" },
    { name: "contentType", value: "article" },
    {
      name: "categoryAndType",
      value: { category: "news", contentType: "article" },
    },
  ],
});

Search with metadata filters:

export const searchForNewsOrSports = action({
  args: {
    query: v.string(),
  },
  handler: async (ctx, args) => {
    const userId = await getUserId(ctx);
    if (!userId) throw new Error("Unauthorized");

    const results = await rag.search(ctx, {
      namespace: userId,
      query: args.query,
      filters: [
        { name: "category", value: "news" },
        { name: "category", value: "sports" },
      ],
      limit: 10,
    });

    return results;
  },
});

Add surrounding chunks to results for context#

Instead of getting just the single matching chunk, you can request surrounding chunks so there's more context to the result.

Note: If there are results that have overlapping ranges, it will not return duplicate chunks, but instead give priority to adding the "before" context to each chunk. For example if you requested 2 before and 1 after, and your results were for the same entryId indexes 1, 4, and 7, the results would be:

[
  // Only one before chunk available, and leaves chunk2 for the next result.
  { order: 1, content: [chunk0, chunk1], startOrder: 0, ... },
  // 2 before chunks available, but leaves chunk5 for the next result.
  { order: 4, content: [chunk2, chunk3, chunk4], startOrder: 2, ... },
  // 2 before chunks available, and includes one after chunk.
  { order: 7, content: [chunk5, chunk6, chunk7, chunk8], startOrder: 5, ... },
]

export const searchWithContext = action({
  args: {
    query: v.string(),
    userId: v.string(),
  },
  handler: async (ctx, args) => {
    const { results, text, entries, usage } = await rag.search(ctx, {
      namespace: args.userId,
      query: args.query,
      chunkContext: { before: 2, after: 1 }, // Include 2 chunks before, 1 after
      limit: 5,
    });

    return { results, text, entries, usage };
  },
});