| tags | |||
|---|---|---|---|
|
A FastAPI-based REST API for serving GloVe word embeddings based on Wikipedia and Gigaword dataset.
- Get word embeddings for single words
- Batch process multiple words
- Find similar words based on embedding similarity
- 300-dimensional word vectors from Hugging Face model hub
- FastAPI-powered with automatic Swagger documentation
- Automated CI/CD pipeline with GitHub Actions
- Docker support
- Code quality checks (Black, isort, Flake8)
- Comprehensive test suite
No special prerequisites required. The model will be automatically downloaded from Hugging Face model hub on first startup.
-
Clone the repository:
git clone https://your-repository-url.git cd your-repository-name -
Install dependencies:
pip install -r requirements.txt
For detailed API documentation, see API Documentation.
GET /- API informationGET /health- API health checkPOST /embedding- Get embedding for a single wordPOST /embeddings- Get embeddings for multiple wordsGET /similar/{word}- Get similar words
Format your code:
black .
isort .Run linting:
flake8 .Run tests with coverage:
pytest tests/ -v --cov=app --cov-report=term-missinguvicorn app:app --reloaduvicorn app:app --host 0.0.0.0 --port 8080GET /- API informationPOST /embedding- Get embedding for a single wordPOST /embeddings- Get embeddings for multiple wordsGET /similar/{word}- Get similar wordsPOST /tokenize- Tokenize text using GPT-3/4 tokenizersGET /available-tokenizers- Get a list of available tokenizers
This project is configured for deployment on Replit. Simply import the repository and click Run.
-
Build the Docker image:
docker build -t glove-api . -
Run the container:
docker run -p 8080:8080 glove-api
The project includes a GitHub Actions workflow that:
- Runs on every push and pull request to main/master branches
- Checks code formatting with Black
- Validates imports with isort
- Runs linting with Flake8
- Executes the test suite
- Deploys to Replit (on main/master)
- Builds and pushes Docker image (on main/master)
Required secrets for deployment:
REPL_ID: Your Replit project IDREPLIT_TOKEN: Your Replit authentication tokenDOCKERHUB_USERNAME: Your Docker Hub usernameDOCKERHUB_TOKEN: Your Docker Hub access token
Pre-trained GloVe vectors based on Wikipedia and Gigaword dataset:
- 6B tokens
- 400K vocab
- 300d vectors
- Uncased
This repository uses Git LFS to handle large files. The following file types are tracked by Git LFS:
.model- Model files.npy- NumPy array files.bin- Binary files.vec- Vector files.gz- Compressed files
When cloning the repository, make sure to have Git LFS installed and run git lfs pull to download the model files.
MIT