Open In App

RAG System with LangChain and LangGraph

Last Updated : 24 Oct, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

In this article we will build a Retrieval-Augmented Generation (RAG) system that improves AI answers by combining large language models with a smart document search. It reads documents, breaks them into smaller parts, turns them into searchable vectors. When user queries it uses context from documents to produce accurate, context-aware answers. Here we will also use:

  • LangChain which loads and splits documents into chunks, creates vector embeddings to represent text and interfaces with language models to generate answers.
  • LangGraph controls the order of retrieval and generation steps, manages state and data flow across the system and enables modular, maintainable AI workflows.
workflow
Workflow

As we can see in the workflow,

  • Documents are loaded and read using LangChain.
  • Documents are split into smaller text chunks.
  • Each chunk is converted into a vector embedding for fast searching.
  • When a user asks a question, it is sent to the LangGraph workflow.
  • LangGraph orchestrates the process, using stored vectors and the user query.
  • The system retrieves relevant chunks using LangChain’s search.
  • LangChain and LangGraph together generate a smart answer using a language model.
  • The final answer is presented back to the user.

Step-by-Step Implementation

Let's build a RAG system with the help of LangChain and LangGraph:

Step 1: Install Dependencies

We will install the require packages that will be needed such as langchain, langgraph, langchain-openai, langchain-text-splitter, langchain-community, networkx and matplotlib.

Python
!pip install langchain langgraph langchain-openai langchain-text-splitters langchain-community networkx matplotlib

Step 2: Setup API Keys

We configure the environment variable for the OpenAI API key. This is required to authenticate and access OpenAI models.

  • os.environ["OPENAI_API_KEY"]: Sets the API key as an environment variable so that LangChain/OpenAI libraries can automatically pick it up when calling the model.

To know how to acess

Python
import os

os.environ["OPENAI_API_KEY"] = "openai_API_key"

Step 3: Define the Application State

We define a TypedDict called State to represent the flow of data across our RAG pipeline.

  • question: The user’s query.
  • context: A list of retrieved Document objects relevant to the query.
  • answer: The final generated response from the language model.
Python
from typing_extensions import TypedDict, List
from langchain_core.documents import Document


class State(TypedDict):
    question: str
    context: List[Document]
    answer: str

Step 4: Load and Split Documents

Used knowledge based file can be downloaded from here.

We will unzip, loads documents and split it into smaller chunks.

  • RecursiveCharacterTextSplitter: Breaks down long documents into chunks (chunk_size=1000) with overlap (chunk_overlap=200) to maintain context.
Python
import json
from langchain_core.documents import Document

with open('knowledge_base.json', 'r') as f:
    knowledge_items = json.load(f)

local_docs = [Document(page_content=item['text']) for item in knowledge_items]

Step 5: Create Embeddings and Vector Store

We convert the document chunks into embeddings and store them in a vector database for similarity search.

  • OpenAIEmbeddings: Generates embeddings using OpenAI’s embedding model text-embedding-3-large.
  • InMemoryVectorStore: A lightweight in-memory store for embeddings.
  • add_documents: Stores the vector representations of all document chunks.
Python
from langchain.embeddings import OpenAIEmbeddings
from langchain_core.vectorstores import InMemoryVectorStore

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
vector_store = InMemoryVectorStore(embeddings)
vector_store.add_documents(all_splits)

Output:

embeddings
List of Document Chunks IDs from the Model

Step 6: Define Custom Prompt and Initialize LLM Model

Define a custom prompt template guiding the LLM to use retrieved context to answer user questions clearly and concisely. Initialize OpenAI GPT-4.1 chat model with temperature 0.3 for manageable creativity in answers.

Python
from langchain.chat_models import init_chat_model
CUSTOM_PROMPT = """
You are an advanced assistant. Use the context to answer. If insufficient info, say so clearly.

Question: {question}

Context:
{context}

Answer:
"""

llm = init_chat_model("openai:gpt-4.1", temperature=0.3)

Step 7: Define Workflow Functions

Define individual LangGraph node functions for each pipeline step:

  • retrieve(state): Perform similarity search on the vector store to get top 5 matched document chunks related to the question.
  • generate(state): Format the prompt with question + retrieved context, invoke the LLM and return the generated answer.
  • classify(state): Dummy function that identifies "advanced" questions but currently passes the question unchanged.
  • refine(state): Append a refinement note to the generated answer for clarity.
Python
def retrieve(state: State):
    retrieved_docs = vector_store.similarity_search(state["question"], k=5)
    return {"context": retrieved_docs}


def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    prompt_filled = CUSTOM_PROMPT.format(
        question=state["question"], context=docs_content)
    response = llm.invoke([{"role": "user", "content": prompt_filled}])
    return {"answer": response.content}


def classify(state: State):
    is_advanced = "advanced" in state["question"].lower()
    return {"question": state["question"]}


def refine(state: State):
    refined_answer = state["answer"] + \
        "\n\n[Refined for clarity and completeness]"
    return {"answer": refined_answer}

Step 8: Build the LangGraph Workflow

We define the pipeline as a graph using LangGraph.

  • StateGraph(State): Defines a graph where nodes pass along State.
  • add_sequence([retrieve, generate]): Runs retrieval first, then generation.
  • add_edge(START, "retrieve"): Connects the start of the graph to the first node.
  • compile(): Finalizes the graph for execution.
Python
from langgraph.graph import START, StateGraph

graph_builder = StateGraph(State).add_sequence(
    [classify, retrieve, generate, refine])
graph_builder.add_edge(START, "classify")
graph = graph_builder.compile()

Step 7: Visualize the LangGraph Workflow

Using NetworkX and Matplotlib, we will visualize our workflow,

  • G = nx.DiGraph(): Creates an empty directed graph where edges have direction, modeling workflow steps.
  • nx.draw_networkx_nodes(...): Draws graph nodes with specified colors, sizes and borders.
  • nx.draw_networkx_edges(...): Draws arrows between nodes with custom style and curvature for clarity.
  • plt.title(), plt.tight_layout(), plt.axis('off'): Sets title, adjusts layout and hides axes for a clean plot.
Python
import networkx as nx
import matplotlib.pyplot as plt


def visualize_langgraph_clean(graph_builder):
    G = nx.DiGraph()
    for node_name in graph_builder.nodes:
        G.add_node(node_name)
    for src, tgt in graph_builder.edges:
        G.add_edge(src, tgt)

    try:
        pos = nx.nx_agraph.graphviz_layout(G, prog='dot')
    except Exception:
        pos = nx.spring_layout(G, seed=42, k=1.2)

    node_styles = {
        "__start__": {"color": "#666666", "size": 3500},
        "classify": {"color": "#56c2ff", "size": 3300},
        "retrieve": {"color": "#75ff90", "size": 3300},
        "generate": {"color": "#ff8888", "size": 3300},
        "refine": {"color": "#b996fa", "size": 3500}
    }
    node_colors = [node_styles.get(node, {"color": "#cccccc"})[
        "color"] for node in G.nodes()]
    node_sizes = [node_styles.get(node, {"size": 2700})[
        "size"] for node in G.nodes()]

    nx.draw_networkx_nodes(G, pos, node_color=node_colors,
                           node_size=node_sizes, edgecolors='#303030', alpha=0.93)
    nx.draw_networkx_edges(G, pos, arrows=True, arrowstyle='-', arrowsize=25,
                           width=3, edge_color='#555', alpha=0.75, connectionstyle='arc3,rad=0.08')
    nx.draw_networkx_labels(G, pos, font_size=17,
                            font_weight='bold', font_family='sans-serif')

    plt.title("LangGraph Workflow", fontsize=18, fontweight='bold', pad=15)
    plt.tight_layout()
    plt.axis('off')
    plt.show()


visualize_langgraph_clean(graph_builder)

Output:

floww
LangGraph Worflow

With the help of langgraph we were able to visualize workflow of our application which is helpful for better communication and understanding.

Step 8: Run the System

We take user input, pass it through the graph and display the answer.

  • graph.invoke(): Executes the graph pipeline (retrieve: generate).
Python
print("RAG system is ready. Type 'exit' to quit.")

while True:
    question = input("Enter your question: ")
    if question.lower() in ("exit", "quit", "stop"):
        print("Exiting program. Goodbye!")
        break

    response = graph.invoke({"question": question})
    answer = response.get("answer", "No answer generated.")
    print("\nAnswer:\n")
    print(answer)
    print("\n" + "=" * 90 + "\n")

Output:

Query
Model Output

Advantages

Let's see the advantages that are offered by this system:

  • Grounded Responses: Unlike vanilla LLMs that may hallucinate, RAG grounds the model’s answers in actual documents, improving factual accuracy.
  • Domain Adaptability: We can easily load custom datasets (PDFs, web pages, internal notes) and make the system specialized for finance, healthcare, legal, research, etc.
  • Up-to-date Knowledge: The system retrieves the latest information from external sources, overcoming the LLM’s fixed training cutoff.
  • Efficient Context Management: By using document chunking and vector search, the model only processes the most relevant text instead of the entire dataset, reducing token costs and speeding up inference.

Explore