This document provides a comprehensive overview of RAGFlow's architecture, core capabilities, and system organization. It introduces the major subsystems, their interactions, and the technology stack that powers the platform.
RAGFlow is an open-source Retrieval-Augmented Generation (RAG) engine that combines deep document understanding, hybrid search, and agent capabilities to provide a production-ready AI system. For specific implementation details on deployment, see Getting Started and Deployment. For LLM provider configuration, see LLM Integration System. For document processing pipelines, see Document Processing Pipeline. For agent workflow orchestration, see Agent and Workflow System.
RAGFlow is a full-stack RAG system built on a layered architecture that processes documents, indexes them in vector stores, and enables intelligent question-answering with grounded citations. The system is designed to handle enterprise-scale deployments with multi-tenancy, pluggable components, and support for 50+ LLM providers.
The codebase is organized into several key subsystems:
api/apps/, rag/svr/)web/)rag/app/, rag/nlp/)rag/llm/)agent/)Sources: README.md73-75 rag/svr/task_executor.py1-72 api/db/db_models.py1-30
Architecture Diagram: RAGFlow Component Organization
The architecture follows a strict layering pattern where each layer has well-defined responsibilities and communicates through standardized interfaces. The API layer exposes RESTful endpoints and handles request routing. The Core Engine processes documents asynchronously using a task queue. The Agent System provides workflow orchestration capabilities. The LLM Layer abstracts model providers. The Data Layer persists all state.
Sources: README.md136-141 rag/svr/task_executor.py75-92 api/db/db_models.py1-100 rag/llm/__init__.py1-50
web/)The React-based frontend provides a multilingual interface supporting 12 languages. It uses UmiJS for routing (60+ routes), React Query for server state management, and Tailwind CSS for styling. The application communicates with the backend through a proxy configuration that routes requests to ports 9380 (general) and 9381 (admin).
Key Modules:
web/src/routes/web/src/locales/web/src/components/Sources: README.md1-50 web/ directory structure
api/)The backend runs a Quart-based asynchronous HTTP server that exposes RESTful endpoints for document management, chat completions, knowledge base operations, and agent execution. The main entry point is api/ragflow_server.py.
Key Services (in api/db/services/):
DocumentService: Document CRUD and status trackingDialogService: Chat completions with RAGKnowledgebaseService: Dataset managementLLMBundle: Unified LLM access with tenant isolationTaskService: Background job orchestrationPrimary Endpoints (in api/apps/):
document_app.py: Document upload and parsingconversation_app.py: Chat sessionskb_app.py: Knowledge base operationsllm_app.py: Model configurationsdk/session.py: SDK-compatible endpointsSources: api/apps/document_app.py1-50 api/db/services/dialog_service.py1-50 api/apps/sdk/session.py1-50
rag/svr/task_executor.py)The task_executor.py module is the core document processing engine. It runs as a separate process that consumes tasks from Redis queues using the consumer group pattern.
Task Processing Flow Diagram
Key Functions:
collect(): rag/svr/task_executor.py164-218 - Polls Redis for tasksbuild_chunks(): rag/svr/task_executor.py224-471 - Applies chunking strategiesembedding(): rag/svr/task_executor.py512-564 - Generates embeddingsrun_dataflow(): rag/svr/task_executor.py566-698 - Executes custom pipelinesThe executor uses the FACTORY dict rag/svr/task_executor.py75-92 to map parser types to chunking implementations:
| Parser Type | Module | Strategy |
|---|---|---|
naive | rag.app.naive | Generic text splitting |
qa | rag.app.qa | Q&A pair extraction |
table | rag.app.table | Table structure preservation |
paper | rag.app.paper | Academic paper sections |
book | rag.app.book | Chapter hierarchy |
resume | rag.app.resume | Resume sections |
Sources: rag/svr/task_executor.py75-92 rag/svr/task_executor.py224-471 rag/svr/task_executor.py512-564
rag/nlp/search.py)The Dealer class rag/nlp/search.py37-41 implements hybrid search combining vector similarity and full-text matching.
Core Methods:
search(): rag/nlp/search.py74-149 - Main search entry pointget_vector(): rag/nlp/search.py53-61 - Creates MatchDenseExpr for vector searchhybrid_similarity(): rag/nlp/search.py299-472 - Fuses vector and keyword scoresThe search flow uses the docStoreConn abstraction to query either Elasticsearch or Infinity:
Sources: rag/nlp/search.py37-149 rag/nlp/search.py299-472
api/db/services/dialog_service.py)The async_chat() function api/db/services/dialog_service.py281-441 orchestrates the complete RAG workflow:
retriever.retrieval() for hybrid search[ID:x] markersRAG Completion Sequence Diagram
Sources: api/db/services/dialog_service.py281-441 rag/nlp/search.py299-472
rag/llm/)The LLM subsystem uses a factory pattern with model-specific implementations organized by type:
Factory Registration:
ChatModel: rag/llm/chat_model.py1-100 - Dict of chat implementationsEmbeddingModel: rag/llm/embedding_model.py1-100 - Dict of embedding implementationsRerankModel: rag/llm/rerank_model.py1-50 - Dict of rerank implementationsBase Classes:
chat_model.Base: rag/llm/chat_model.py64-487 - Async chat, streaming, tool callingembedding_model.Base: rag/llm/embedding_model.py37-51 - Text embedding interfacererank_model.Base: rag/llm/rerank_model.py28-38 - Reranking interfaceUnified Access:
The LLMBundle class api/db/services/llm_service.py1-200 provides tenant-aware access with usage tracking:
Configuration Source: Models are loaded from conf/llm_factories.json into database tables LLMFactories and LLM. Per-tenant API keys are stored in TenantLLM.
Sources: rag/llm/chat_model.py64-487 rag/llm/embedding_model.py37-90 api/db/services/llm_service.py1-200 conf/llm_factories.json1-50
agent/)The agent system enables workflow orchestration through a JSON DSL executed by the Canvas class.
Key Components:
Canvas: agent/canvas.py - DSL parser and execution engineComponentBase: agent/component.py - Base class for workflow nodesAgent: agent/agent.py - ReAct loop with tool useThe Canvas DSL defines nodes and edges where each node is a component with inputs/outputs. The Canvas.run() method executes the graph asynchronously, resolving variable references like {sys.query} and {component_id@field}.
Sources: README.md81-84 agent/ directory structure
RAGFlow uses a multi-database architecture where each store serves specific purposes:
| Store | Purpose | Key Data | Port |
|---|---|---|---|
| MySQL 8.0 | Relational metadata | Users, Tenants, Knowledgebase, Document, Dialog, Task | 3306 |
| Redis/Valkey | Cache and queues | Task messages, session state, LLM cache | 6379 |
| MinIO | Object storage | Uploaded files, images, thumbnails | 9000 |
| Elasticsearch/Infinity | Full-text + vectors | Indexed chunks, embeddings | 1200/23817 |
Database Models (in api/db/db_models.py):
Tenant: api/db/db_models.py - Multi-tenancy rootKnowledgebase: Dataset configuration and embedding modelDocument: File metadata and processing statusTask: Background job state machineDialog: Chat assistant configurationConversation: Chat session messagesDocument Engine Abstraction: The docStoreConn interface in common.doc_store.doc_store_base supports multiple backends configured via DOC_ENGINE environment variable:
elasticsearch (default): docker/.env12infinity: High-performance vector DBopensearch: ES-compatible alternativeoceanbase: Distributed SQL with vector supportSources: api/db/db_models.py1-100 docker/.env1-50 README.md272-293
Document Ingestion Pipeline Diagram
The pipeline is fully asynchronous with progress tracking. Each chunk gets fields:
content_with_weight: Weighted text contentcontent_ltks: Tokenized contentq_N_vec: Embedding vector (N = dimension)important_kwd: Optional LLM-extracted keywordsquestion_kwd: Optional LLM-generated questionspage_num_int, top_int: Position metadataSources: rag/svr/task_executor.py224-471 api/apps/document_app.py52-86 api/db/services/document_service.py1-200
| Layer | Technologies |
|---|---|
| Frontend | React 18, UmiJS, Tailwind CSS, React Query, Zustand, Ant Design, Radix UI |
| Backend | Python 3.12, Quart (async Flask), Peewee ORM, asyncio |
| Document Processing | DeepDoc (OCR/Layout), OpenCV, NumPy, pandas |
| NLP/Search | Custom tokenizer, BM25 (via doc store), HNSW (vector search) |
| LLM Integration | OpenAI SDK, LiteLLM, custom adapters for 50+ providers |
| Storage | MySQL 8.0, Redis 7.x, MinIO, Elasticsearch 8.11/Infinity |
| Deployment | Docker Compose, multi-stage builds, environment-based profiles |
| Development | uv (package manager), pre-commit hooks, pytest, Ruff (linting) |
Configuration Files:
Sources: README.md143-150 docker/.env1-100 conf/llm_factories.json1-50
Docker Deployment:
This starts:
ragflow container: Runs both ragflow_server.py and task_executor.pyDevelopment Mode:
Key Processes:
api/ragflow_server.py: HTTP API server (Quart app on port 9380)rag/svr/task_executor.py: Background task processor (multiple workers possible)Sources: README.md184-375 docker/docker-compose.yml
For a chat completion request:
POST /chats/{chat_id}/completionsasync_chat()Sources: api/apps/sdk/session.py124-173 api/db/services/dialog_service.py281-441
RAGFlow's architecture provides several extension mechanisms:
rag/app/ and register in FACTORY dictBase class in rag/llm/ modules and register in factoryDocStoreConnection interface in common/doc_store/ComponentBase in agent/component.pySources: rag/svr/task_executor.py75-92 rag/llm/__init__.py25-50 agent/component.py
Refresh this wiki
This wiki was recently refreshed. Please wait 2 days to refresh again.