
npm install @convex-dev/ragA component for semantic search, usually used to look up context for LLMs. Use with an Agent for Retrieval-Augmented Generation (RAG).
Found a bug? Feature request? File it here.
Create a convex.config.ts file in your app's convex/ folder and install the
component by calling use:
// convex/convex.config.ts
import { defineApp } from "convex/server";
import rag from "@convex-dev/rag/convex.config.js";
const app = defineApp();
app.use(rag);
export default app;
Run npx convex codegen if npx convex dev isn't already running.
// convex/example.ts
import { components } from "./_generated/api";
import { RAG } from "@convex-dev/rag";
// Any AI SDK model that supports embeddings will work.
import { openai } from "@ai-sdk/openai";
const rag = new RAG(components.rag, {
textEmbeddingModel: openai.embedding("text-embedding-3-small"),
embeddingDimension: 1536, // Needs to match your embedding model
});
Add content with text chunks. Each call to add will create a new entry. It
will embed the chunks automatically if you don't provide them.
export const add = action({
args: { text: v.string() },
handler: async (ctx, { text }) => {
// Add the text to a namespace shared by all users.
await rag.add(ctx, {
namespace: "all-users",
text,
});
},
});
See below for how to chunk the text yourself or add content asynchronously, e.g. to handle large files.
Search across content with vector similarity
text is a string with the full content of the results, for convenience. It
is in order of the entries, with titles at each entry boundary, and separators
between non-sequential chunks. See below for more details.results is an array of matching chunks with scores and more metadata.entries is an array of the entries that matched the query. Each result has a
entryId referencing one of these source entries.usage contains embedding token usage information. Will be { tokens: 0 } if
no embedding was performed (e.g. when passing pre-computed embeddings).export const search = action({
args: {
query: v.string(),
},
handler: async (ctx, args) => {
const { results, text, entries, usage } = await rag.search(ctx, {
namespace: "global",
query: args.query,
limit: 10,
vectorScoreThreshold: 0.5, // Only return results with a score >= 0.5
});
return { results, text, entries, usage };
},
});
Once you have searched for the context, you can use it with an LLM.
Generally you'll already be using something to make LLM requests, e.g. the Agent Component, which tracks the message history for you. See the Agent Component docs for more details on doing RAG with the Agent Component.
However, if you just want a one-off response, you can use the generateText
function as a convenience.
This will automatically search for relevant entries and use them as context for the LLM, using default formatting.
The arguments to generateText are compatible with all arguments to
generateText from the AI SDK.
export const askQuestion = action({
args: {
prompt: v.string(),
},
handler: async (ctx, args) => {
const userId = await getAuthUserId(ctx);
const { text, context } = await rag.generateText(ctx, {
search: { namespace: userId, limit: 10 },
prompt: args.prompt,
model: openai.chat("gpt-4o-mini"),
});
return { answer: text, context };
},
Note: You can specify any of the search options available on rag.search.
You can provide filters when adding content and use them to search. To do this, you'll need to give the RAG component a list of the filter names. You can optionally provide a type parameter for type safety (no runtime validation).
Note: these filters can be OR'd together when searching. In order to get an AND,
you provide a filter with a more complex value, such as categoryAndType below.
// convex/example.ts
import { components } from "./_generated/api";
import { RAG } from "@convex-dev/rag";
// Any AI SDK model that supports embeddings will work.
import { openai } from "@ai-sdk/openai";
// Optional: Add type safety to your filters.
type FilterTypes = {
category: string;
contentType: string;
categoryAndType: { category: string; contentType: string };
};
const rag = new RAG<FilterTypes>(components.rag, {
textEmbeddingModel: openai.embedding("text-embedding-3-small"),
embeddingDimension: 1536, // Needs to match your embedding model
filterNames: ["category", "contentType", "categoryAndType"],
});
Adding content with filters:
await rag.add(ctx, {
namespace: "global",
text,
filterValues: [
{ name: "category", value: "news" },
{ name: "contentType", value: "article" },
{
name: "categoryAndType",
value: { category: "news", contentType: "article" },
},
],
});
Search with metadata filters:
export const searchForNewsOrSports = action({
args: {
query: v.string(),
},
handler: async (ctx, args) => {
const userId = await getUserId(ctx);
if (!userId) throw new Error("Unauthorized");
const results = await rag.search(ctx, {
namespace: userId,
query: args.query,
filters: [
{ name: "category", value: "news" },
{ name: "category", value: "sports" },
],
limit: 10,
});
return results;
},
});
Instead of getting just the single matching chunk, you can request surrounding chunks so there's more context to the result.
Note: If there are results that have overlapping ranges, it will not return duplicate chunks, but instead give priority to adding the "before" context to each chunk. For example if you requested 2 before and 1 after, and your results were for the same entryId indexes 1, 4, and 7, the results would be:
[
// Only one before chunk available, and leaves chunk2 for the next result.
{ order: 1, content: [chunk0, chunk1], startOrder: 0, ... },
// 2 before chunks available, but leaves chunk5 for the next result.
{ order: 4, content: [chunk2, chunk3, chunk4], startOrder: 2, ... },
// 2 before chunks available, and includes one after chunk.
{ order: 7, content: [chunk5, chunk6, chunk7, chunk8], startOrder: 5, ... },
]
export const searchWithContext = action({
args: {
query: v.string(),
userId: v.string(),
},
handler: async (ctx, args) => {
const { results, text, entries, usage } = await rag.search(ctx, {
namespace: args.userId,
query: args.query,
chunkContext: { before: 2, after: 1 }, // Include 2 chunks before, 1 after
limit: 5,
});
return { results, text, entries, usage };
},
});