Github Documentation Search Sample App

This is a Github Documentation Search Sample App built with Timescale DB, pgvectorscale, and Python FastAPI.

This Sample App lets you easily have a document search app setup and running on your local machine. Additionally, it allows you to load data from text files, create embeddings, chunks your data, and perform semantic search over the data.

You don't need to know anything about Timescale DB, pgvectorscale, or even Postgres to get this running. You can just focus on building out the frontend and gathering your own documentation!

Built by Jacky Liang

Installation and Setup Instructions

Python Environment
- Ensure you have Python 3.7+ installed on your system.
- It's recommended to use a virtual environment:
```
python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
```

Install Dependencies

Create a requirements.txt file with the following content:

fastapi
uvicorn
psycopg2-binary
anthropic
openai
langchain
numpy
pydantic

Install the dependencies:
```
pip install -r requirements.txt
```

Database Setup
- Create a new Timescale DB instance on Timescale Cloud.
- Set a password for the user and update your DATABASE_URL in the .env file.
- Make sure to setup the correct DATABASE_URL in the .env file.
Environment Variables
- Create a .env file in the same directory as docs_server.py with the following content:
```
DATABASE_URL=postgresql://username:password@localhost:5432/your_database_name
ANTHROPIC_API_KEY=your_anthropic_api_key
OPENAI_API_KEY=your_openai_api_key
```
- Replace the placeholders with your actual database credentials and API keys.
- Alternatively, you can set the environment variables in your operating system. For example, on Linux or MacOS, you can set the environment variables with the following commands:
```
export DATABASE_URL=postgresql://username:password@localhost:5432/your_database_name
export ANTHROPIC_API_KEY=your_anthropic_api_key
export OPENAI_API_KEY=your_openai_api_key
```
Running the Server
- Start the server with:
```
python docs_server.py
```
- The server will start on http://localhost:8000.

Using the API

Note that a live playground for API docs are also available at http://localhost:8000/docs.

Load data:

curl -X POST "http://localhost:8000/load_data" \
     -H "Content-Type: application/json" \
     -d '{"file_path": "/path/to/your/data/file.txt"}'

Query the data:

curl -X POST "http://localhost:8000/query" \
     -H "Content-Type: application/json" \
     -d '{"question": "How do you install pgvectorscale?"}'

View the first few documents:

curl "http://localhost:8000/head_documents"

How to get Github documentation data

Go to a Github repo of your choice
Click on the README.md file
Click on the Raw button
Copy the contents
Paste the contents into a text file
Use the /load_data endpoint to load the data into the database

Some Github docs files are included in /sample-data/ containing README for

Feel free to add your own Github README.md files to the /sample-data/ folder.

Further developments

Use pgai to generate embeddings and generate responses
Use a better chunking algorithm like semantic chunking
Automatically load README.md files from Github repositories
Implement one-click deployment to Vercel

Contributing

Feel free to file an issue or create a PR. Thank you for contributing to open source.

Notes

The script assumes you have the necessary permissions to create extensions and tables in your Timescale DB.
Adjust the CharacterTextSplitter parameters in the read_custom_dataset function if you need different chunking behavior.
The script uses OpenAI's embedding model and Anthropic's Claude model. Ensure you have sufficient API credits.
The index.html file is a simple React frontend that allows you to search the documentation. It uses FastAPI's query endpoint to get search results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Github Documentation Search Sample App

Installation and Setup Instructions

How to get Github documentation data

Further developments

Contributing

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
sample-data		sample-data
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
docs_server.py		docs_server.py
index.html		index.html
requirements.txt		requirements.txt

jackyliang/github-docs-search

Folders and files

Latest commit

History

Repository files navigation

Github Documentation Search Sample App

Installation and Setup Instructions

How to get Github documentation data

Further developments

Contributing

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages