This is a Github Documentation Search Sample App built with Timescale DB, pgvectorscale, and Python FastAPI.
This Sample App lets you easily have a document search app setup and running on your local machine. Additionally, it allows you to load data from text files, create embeddings, chunks your data, and perform semantic search over the data.
You don't need to know anything about Timescale DB, pgvectorscale, or even Postgres to get this running. You can just focus on building out the frontend and gathering your own documentation!
Built by Jacky Liang
-
Python Environment
-
Ensure you have Python 3.7+ installed on your system.
-
It's recommended to use a virtual environment:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
-
Install Dependencies
-
Create a
requirements.txtfile with the following content:fastapi uvicorn psycopg2-binary anthropic openai langchain numpy pydantic -
Install the dependencies:
pip install -r requirements.txt
-
-
Database Setup
- Create a new Timescale DB instance on Timescale Cloud.
- Set a password for the user and update your
DATABASE_URLin the.envfile. - Make sure to setup the correct
DATABASE_URLin the.envfile.
-
Environment Variables
-
Create a
.envfile in the same directory asdocs_server.pywith the following content:DATABASE_URL=postgresql://username:password@localhost:5432/your_database_name ANTHROPIC_API_KEY=your_anthropic_api_key OPENAI_API_KEY=your_openai_api_key -
Replace the placeholders with your actual database credentials and API keys.
-
Alternatively, you can set the environment variables in your operating system. For example, on Linux or MacOS, you can set the environment variables with the following commands:
export DATABASE_URL=postgresql://username:password@localhost:5432/your_database_name export ANTHROPIC_API_KEY=your_anthropic_api_key export OPENAI_API_KEY=your_openai_api_key
-
-
Running the Server
-
Start the server with:
python docs_server.py
-
The server will start on
http://localhost:8000.
-
-
Using the API
-
Note that a live playground for API docs are also available at
http://localhost:8000/docs. -
Load data:
curl -X POST "http://localhost:8000/load_data" \ -H "Content-Type: application/json" \ -d '{"file_path": "/path/to/your/data/file.txt"}'
-
Query the data:
curl -X POST "http://localhost:8000/query" \ -H "Content-Type: application/json" \ -d '{"question": "How do you install pgvectorscale?"}'
-
View the first few documents:
curl "http://localhost:8000/head_documents"
-
- Go to a Github repo of your choice
- Click on the
README.mdfile - Click on the
Rawbutton - Copy the contents
- Paste the contents into a text file
- Use the
/load_dataendpoint to load the data into the database
Some Github docs files are included in /sample-data/ containing README for
Feel free to add your own Github README.md files to the /sample-data/ folder.
- Use pgai to generate embeddings and generate responses
- Use a better chunking algorithm like semantic chunking
- Automatically load README.md files from Github repositories
- Implement one-click deployment to Vercel
Feel free to file an issue or create a PR. Thank you for contributing to open source.
- The script assumes you have the necessary permissions to create extensions and tables in your Timescale DB.
- Adjust the
CharacterTextSplitterparameters in theread_custom_datasetfunction if you need different chunking behavior. - The script uses OpenAI's embedding model and Anthropic's Claude model. Ensure you have sufficient API credits.
- The
index.htmlfile is a simple React frontend that allows you to search the documentation. It uses FastAPI'squeryendpoint to get search results.
