This project aims to create a QA bot for National Central University (NCU) campus information using Retrieval-Augmented Generation (RAG) and integrate it with a LINE Bot.
crawler/: Contains scripts for crawling data from various university websites.server/: Contains the RAG server (FastAPI application) and related Docker Compose configurations.server/rag_server/: The core RAG server Python application.server/linebot/: The LINE Bot application.server/open_webui/: (Potentially for a web UI, currently a venv)
- Docker and Docker Compose
- Python 3.8+ (for local development/testing of scripts)
Both the rag_server and linebot require environment variables. Create a .env file in server/rag_server/.env and server/linebot/.env respectively.
server/rag_server/.env:
GEMINI_API_KEY=your_gemini_api_key_here
Replace your_gemini_api_key_here with your actual Gemini API Key obtained from Google AI Studio.
server/linebot/.env:
LINE_CHANNEL_ACCESS_TOKEN=your_line_channel_access_token_here
LINE_CHANNEL_SECRET=your_line_channel_secret_here
Replace with your LINE Bot channel access token and secret.
The crawler/ directory contains Python scripts (e.g., in crawler/adm/) for fetching data from different university departments. The crawled data is stored in crawler/docs/.
To run the crawlers, navigate to the crawler directory and execute the relevant Python scripts. For example:
cd crawler
python adm/news/app.py
# ... run other crawler scripts ...After running the crawlers, the crawler/docs directory should be populated with data (CSV and PDF files).
The RAG database needs to be built before the rag-server can function. This process involves:
- Collecting documents from
crawler/docs. - Processing these documents (text extraction from PDFs, parsing CSVs).
- Generating embeddings for the processed text.
- Storing the embeddings in a Chroma vector database.
We use a Docker Compose service to run the build_database.sh script in an isolated environment.
- Ensure your
.envfile is configured: Make sureserver/rag_server/.envcontains yourGEMINI_API_KEY. - Navigate to the
serverdirectory:cd server - Run the database build command:
This command will:
docker-compose -f docker-compose.build.yml up --build --remove-orphans
- Build the
db-builderDocker image. - Start a temporary container.
- Mount the necessary project directories (including
crawler/docsandserver/rag_server) into the container. - Execute the
build_database.shscript, which will create thechroma_dbdirectory insideserver/rag_server. - The
--remove-orphansflag ensures that any old, unused containers are cleaned up.
- Build the
The application is designed to work flexibly with Ollama, whether it's running locally on your host machine or within a Docker container managed by Docker Compose. You can choose the setup that best suits your needs.
Make sure your server/rag_server/.env file has the OLLAMA_BASE_URL variable. This will be the key to switching between a Dockerized Ollama and a local one.
# server/rag_server/.env
# ... other variables ...
# Set this to point to your Ollama instance.
# Leave it unset if you want the application to automatically detect the Ollama URL.
# (Defaults to http://ollama:11434 in Docker, http://localhost:11434 locally)
# OLLAMA_BASE_URL=
Note: If you are running the rag-server or db-builder in Docker and want to connect to a host-based Ollama, you might need to set OLLAMA_BASE_URL=http://host.docker.internal:11434 in your server/rag_server/.env file and ensure your host Ollama is listening on 0.0.0.0 (start with OLLAMA_HOST=0.0.0.0 ollama serve).
This is the recommended, fully portable setup. It spins up both your application services and an Ollama container, automatically pulling the required models.
# In the 'server' directory, run:
docker-compose -f docker-compose.yml -f docker-compose.ollama.yml up --build- How it works: This command starts
rag-server,linebot, and theollamacontainer. The Python code automatically connects to theollamaservice athttp://ollama:11434within the Docker network.
This is for when you want your application in Docker but prefer to use an Ollama instance running directly on your host machine.
-
Ensure your local Ollama is running and accessible:
- Start your local Ollama server with
OLLAMA_HOST=0.0.0.0 ollama serve. - Make sure the
qwen3-embedding:0.6bmodel is pulled (ollama pull qwen3-embedding:0.6b).
- Start your local Ollama server with
-
Set your
OLLAMA_BASE_URLin.env:# in server/rag_server/.env OLLAMA_BASE_URL=http://host.docker.internal:11434 -
Run Docker Compose (without the ollama service file):
# In the 'server' directory, run: docker-compose up --build
- How it works: The
rag-serverDocker container readsOLLAMA_BASE_URLfrom its environment and connects back to your host machine's Ollama instance.
This is for running the Python scripts directly on your machine, connecting to a local Ollama instance.
-
Ensure your local Ollama is running and accessible:
- Start your local Ollama server with
OLLAMA_HOST=0.0.0.0 ollama serve. - Make sure the
qwen3-embedding:0.6bmodel is pulled (ollama pull qwen3-embedding:0.6b).
- Start your local Ollama server with
-
Navigate to the
rag_serverdirectory and activate venv:cd server/rag_server source venv/bin/activate
-
Ensure
OLLAMA_BASE_URLis unset or set tohttp://localhost:11434in your environment (orserver/rag_server/.env):# in server/rag_server/.env OLLAMA_BASE_URL=http://localhost:11434 -
Run the server:
python server.py
(To build the database locally, you would run
python DBHandler.py)
This is less common but demonstrates the flexibility.
-
Start ONLY the Ollama container:
# In the 'server' directory, run: docker-compose -f docker-compose.ollama.yml up --build -
Navigate to the
rag_serverdirectory and activate venv:cd server/rag_server source venv/bin/activate
-
Ensure
OLLAMA_BASE_URLpoints to your host's exposed Ollama port:# in server/rag_server/.env OLLAMA_BASE_URL=http://localhost:11434 -
Run the server:
python server.py
- How it works: Your local Python script connects to
localhost:11434, which is the port exposed by the Ollama container on your host machine.
The RAG database needs to be built before the rag-server can function. This process involves:
- Collecting documents from
crawler/docs. - Processing these documents (text extraction from PDFs, parsing CSVs).
- Generating embeddings for the processed text.
- Storing the embeddings in a Chroma vector database.
We use a Docker Compose service to run the build_database.sh script in an isolated environment.
-
Ensure your
.envfile is configured: Make sureserver/rag_server/.envcontains yourGEMINI_API_KEY. -
Navigate to the
serverdirectory:cd server -
Run the database build command (choose one):
- Using a Dockerized Ollama (recommended for portability):
docker-compose -f docker-compose.build.yml -f docker-compose.ollama.yml up --build --remove-orphans
- Using a Localhost Ollama (requires host Ollama running with
OLLAMA_HOST=0.0.0.0andOLLAMA_BASE_URL=http://host.docker.internal:11434inserver/rag_server/.env):docker-compose -f docker-compose.build.yml up --build --remove-orphans
This command will:
- Build the
db-builderDocker image. - Start a temporary container.
- Mount the necessary project directories (including
crawler/docsandserver/rag_server) into the container. - Execute the
build_database.shscript, which will create thechroma_dbdirectory insideserver/rag_server. - The
--remove-orphansflag ensures that any old, unused containers are cleaned up.
- Using a Dockerized Ollama (recommended for portability):
- RAG Server Health Check:
curl http://localhost:8000/health
- LINE Bot: Interact with your LINE Bot via the LINE app. Ensure it's configured to point to your deployed
linebotservice.
bash: build_database.sh: No such file or directory: Ensure you are running thedocker-composecommand from theserverdirectory and thatdocker-compose.build.ymlis correctly configured.ERROR: Could not find a version that satisfies the requirement pysqlite3-binary: This indicates an environment issue. Ensure your Python environment is compatible or try installingsqlite3-binarymanually if necessary.- "無法回答此問題" from the bot:
- Check the
rag-serverlogs to see what documents are being retrieved. - Ensure your
crawler/docsdirectory is populated with relevant data. - Rebuild the database after making any changes to documents or
DBHandler.py.
- Check the
Please note: The Docker Compose setup assumes a specific project structure. If you modify the directory layout, you may need to adjust the volumes and build.context paths in the docker-compose files accordingly.