Overview

Relevant source files

Purpose and Scope

This document provides a comprehensive overview of RAGFlow's architecture, core capabilities, and system organization. It introduces the major subsystems, their interactions, and the technology stack that powers the platform.

RAGFlow is an open-source Retrieval-Augmented Generation (RAG) engine that combines deep document understanding, hybrid search, and agent capabilities to provide a production-ready AI system. For specific implementation details on deployment, see Getting Started and Deployment. For LLM provider configuration, see LLM Integration System. For document processing pipelines, see Document Processing Pipeline. For agent workflow orchestration, see Agent and Workflow System.

What is RAGFlow

RAGFlow is a full-stack RAG system built on a layered architecture that processes documents, indexes them in vector stores, and enables intelligent question-answering with grounded citations. The system is designed to handle enterprise-scale deployments with multi-tenancy, pluggable components, and support for 50+ LLM providers.

The codebase is organized into several key subsystems:

Backend Services: Python-based API server using Quart/Flask (api/apps/, rag/svr/)
Frontend Application: React + UmiJS web interface (web/)
RAG Engine: Document parsing, chunking, and retrieval (rag/app/, rag/nlp/)
LLM Integration: Unified interface to multiple LLM providers (rag/llm/)
Agent System: Workflow orchestration via Canvas DSL (agent/)
Storage Layer: MySQL, Redis, MinIO, and pluggable vector stores

Sources: README.md73-75 rag/svr/task_executor.py1-72 api/db/db_models.py1-30

High-Level System Architecture

Architecture Diagram: RAGFlow Component Organization

The architecture follows a strict layering pattern where each layer has well-defined responsibilities and communicates through standardized interfaces. The API layer exposes RESTful endpoints and handles request routing. The Core Engine processes documents asynchronously using a task queue. The Agent System provides workflow orchestration capabilities. The LLM Layer abstracts model providers. The Data Layer persists all state.

Sources: README.md136-141 rag/svr/task_executor.py75-92 api/db/db_models.py1-100 rag/llm/__init__.py1-50

Core Components and Responsibilities

Frontend Application (`web/`)

The React-based frontend provides a multilingual interface supporting 12 languages. It uses UmiJS for routing (60+ routes), React Query for server state management, and Tailwind CSS for styling. The application communicates with the backend through a proxy configuration that routes requests to ports 9380 (general) and 9381 (admin).

Key Modules:

Route definitions in web/src/routes/
Localization files in web/src/locales/
State management with React Query and Zustand
UI components in web/src/components/

Sources: README.md1-50 web/ directory structure

API Server (`api/`)

The backend runs a Quart-based asynchronous HTTP server that exposes RESTful endpoints for document management, chat completions, knowledge base operations, and agent execution. The main entry point is api/ragflow_server.py.

Key Services (in api/db/services/):

DocumentService: Document CRUD and status tracking
DialogService: Chat completions with RAG
KnowledgebaseService: Dataset management
LLMBundle: Unified LLM access with tenant isolation
TaskService: Background job orchestration

Primary Endpoints (in api/apps/):

document_app.py: Document upload and parsing
conversation_app.py: Chat sessions
kb_app.py: Knowledge base operations
llm_app.py: Model configuration
sdk/session.py: SDK-compatible endpoints

Sources: api/apps/document_app.py1-50 api/db/services/dialog_service.py1-50 api/apps/sdk/session.py1-50

Task Executor (`rag/svr/task_executor.py`)

The task_executor.py module is the core document processing engine. It runs as a separate process that consumes tasks from Redis queues using the consumer group pattern.

Task Processing Flow Diagram

Key Functions:

collect(): rag/svr/task_executor.py164-218 - Polls Redis for tasks
build_chunks(): rag/svr/task_executor.py224-471 - Applies chunking strategies
embedding(): rag/svr/task_executor.py512-564 - Generates embeddings
run_dataflow(): rag/svr/task_executor.py566-698 - Executes custom pipelines

The executor uses the FACTORY dict rag/svr/task_executor.py75-92 to map parser types to chunking implementations:

Parser Type	Module	Strategy
`naive`	`rag.app.naive`	Generic text splitting
`qa`	`rag.app.qa`	Q&A pair extraction
`table`	`rag.app.table`	Table structure preservation
`paper`	`rag.app.paper`	Academic paper sections
`book`	`rag.app.book`	Chapter hierarchy
`resume`	`rag.app.resume`	Resume sections

Sources: rag/svr/task_executor.py75-92 rag/svr/task_executor.py224-471 rag/svr/task_executor.py512-564

Search and Retrieval (`rag/nlp/search.py`)

The Dealer class rag/nlp/search.py37-41 implements hybrid search combining vector similarity and full-text matching.

Core Methods:

search(): rag/nlp/search.py74-149 - Main search entry point
get_vector(): rag/nlp/search.py53-61 - Creates MatchDenseExpr for vector search
hybrid_similarity(): rag/nlp/search.py299-472 - Fuses vector and keyword scores

The search flow uses the docStoreConn abstraction to query either Elasticsearch or Infinity:

Sources: rag/nlp/search.py37-149 rag/nlp/search.py299-472

RAG Pipeline (`api/db/services/dialog_service.py`)

The async_chat() function api/db/services/dialog_service.py281-441 orchestrates the complete RAG workflow:

Query Processing: Extract keywords, cross-language translation if needed
Retrieval: Call retriever.retrieval() for hybrid search
Reranking: Optional rerank with dedicated models
Context Assembly: Format chunks with citations
Response Generation: Stream LLM response with [ID:x] markers

RAG Completion Sequence Diagram

Sources: api/db/services/dialog_service.py281-441 rag/nlp/search.py299-472

LLM Integration (`rag/llm/`)

The LLM subsystem uses a factory pattern with model-specific implementations organized by type:

Factory Registration:

ChatModel: rag/llm/chat_model.py1-100 - Dict of chat implementations
EmbeddingModel: rag/llm/embedding_model.py1-100 - Dict of embedding implementations
RerankModel: rag/llm/rerank_model.py1-50 - Dict of rerank implementations

Base Classes:

chat_model.Base: rag/llm/chat_model.py64-487 - Async chat, streaming, tool calling
embedding_model.Base: rag/llm/embedding_model.py37-51 - Text embedding interface
rerank_model.Base: rag/llm/rerank_model.py28-38 - Reranking interface

Unified Access: The LLMBundle class api/db/services/llm_service.py1-200 provides tenant-aware access with usage tracking:

Configuration Source: Models are loaded from conf/llm_factories.json into database tables LLMFactories and LLM. Per-tenant API keys are stored in TenantLLM.

Sources: rag/llm/chat_model.py64-487 rag/llm/embedding_model.py37-90 api/db/services/llm_service.py1-200 conf/llm_factories.json1-50

Agent System (`agent/`)

The agent system enables workflow orchestration through a JSON DSL executed by the Canvas class.

Key Components:

Canvas: agent/canvas.py - DSL parser and execution engine
ComponentBase: agent/component.py - Base class for workflow nodes
Agent: agent/agent.py - ReAct loop with tool use

The Canvas DSL defines nodes and edges where each node is a component with inputs/outputs. The Canvas.run() method executes the graph asynchronously, resolving variable references like {sys.query} and {component_id@field}.

Sources: README.md81-84 agent/ directory structure

Data Storage Architecture

RAGFlow uses a multi-database architecture where each store serves specific purposes:

Store	Purpose	Key Data	Port
MySQL 8.0	Relational metadata	Users, Tenants, Knowledgebase, Document, Dialog, Task	3306
Redis/Valkey	Cache and queues	Task messages, session state, LLM cache	6379
MinIO	Object storage	Uploaded files, images, thumbnails	9000
Elasticsearch/Infinity	Full-text + vectors	Indexed chunks, embeddings	1200/23817

Database Models (in api/db/db_models.py):

Tenant: api/db/db_models.py - Multi-tenancy root
Knowledgebase: Dataset configuration and embedding model
Document: File metadata and processing status
Task: Background job state machine
Dialog: Chat assistant configuration
Conversation: Chat session messages

Document Engine Abstraction: The docStoreConn interface in common.doc_store.doc_store_base supports multiple backends configured via DOC_ENGINE environment variable:

elasticsearch (default): docker/.env12
infinity: High-performance vector DB
opensearch: ES-compatible alternative
oceanbase: Distributed SQL with vector support

Sources: api/db/db_models.py1-100 docker/.env1-50 README.md272-293

Document Processing Flow

Document Ingestion Pipeline Diagram

The pipeline is fully asynchronous with progress tracking. Each chunk gets fields:

content_with_weight: Weighted text content
content_ltks: Tokenized content
q_N_vec: Embedding vector (N = dimension)
important_kwd: Optional LLM-extracted keywords
question_kwd: Optional LLM-generated questions
page_num_int, top_int: Position metadata

Sources: rag/svr/task_executor.py224-471 api/apps/document_app.py52-86 api/db/services/document_service.py1-200

Technology Stack Summary

Layer	Technologies
Frontend	React 18, UmiJS, Tailwind CSS, React Query, Zustand, Ant Design, Radix UI
Backend	Python 3.12, Quart (async Flask), Peewee ORM, asyncio
Document Processing	DeepDoc (OCR/Layout), OpenCV, NumPy, pandas
NLP/Search	Custom tokenizer, BM25 (via doc store), HNSW (vector search)
LLM Integration	OpenAI SDK, LiteLLM, custom adapters for 50+ providers
Storage	MySQL 8.0, Redis 7.x, MinIO, Elasticsearch 8.11/Infinity
Deployment	Docker Compose, multi-stage builds, environment-based profiles
Development	uv (package manager), pre-commit hooks, pytest, Ruff (linting)

Configuration Files:

docker/.env: Environment variables and compose profiles
docker/service_conf.yaml.template: Backend service configuration
conf/llm_factories.json: LLM provider definitions
docker/docker-compose.yml: Service orchestration

Sources: README.md143-150 docker/.env1-100 conf/llm_factories.json1-50

Entry Points and Execution

Starting the System

Docker Deployment:

This starts:

ragflow container: Runs both ragflow_server.py and task_executor.py
Supporting services: MySQL, Redis, MinIO, Elasticsearch/Infinity

Development Mode:

Key Processes:

api/ragflow_server.py: HTTP API server (Quart app on port 9380)
rag/svr/task_executor.py: Background task processor (multiple workers possible)

Sources: README.md184-375 docker/docker-compose.yml

Request Flow Example

For a chat completion request:

Frontend: User sends message via POST /chats/{chat_id}/completions
API Router: api/apps/sdk/session.py124-173 handles request
Service Layer: api/db/services/conversation_service.py orchestrates RAG
Dialog Service: api/db/services/dialog_service.py281-441 executes async_chat()
Retrieval: rag/nlp/search.py queries vector DB
LLM Call: rag/llm/chat_model.py streams response
Response: SSE events stream back to frontend with citations

Sources: api/apps/sdk/session.py124-173 api/db/services/dialog_service.py281-441

Extensibility Points

RAGFlow's architecture provides several extension mechanisms:

Custom Chunking Strategies: Add new parser to rag/app/ and register in FACTORY dict
LLM Providers: Implement Base class in rag/llm/ modules and register in factory
Document Engines: Implement DocStoreConnection interface in common/doc_store/
Agent Components: Extend ComponentBase in agent/component.py
Custom Pipelines: Define workflow DSL for Canvas system

Sources: rag/svr/task_executor.py75-92 rag/llm/__init__.py25-50 agent/component.py