GloVe Word Embeddings API

A FastAPI-based REST API for serving GloVe word embeddings based on Wikipedia and Gigaword dataset.

Features

Get word embeddings for single words
Batch process multiple words
Find similar words based on embedding similarity
300-dimensional word vectors from Hugging Face model hub
FastAPI-powered with automatic Swagger documentation
Automated CI/CD pipeline with GitHub Actions
Docker support
Code quality checks (Black, isort, Flake8)
Comprehensive test suite

Prerequisites

No special prerequisites required. The model will be automatically downloaded from Hugging Face model hub on first startup.

Installation

Clone the repository:

git clone https://your-repository-url.git
cd your-repository-name

Install dependencies:
```
pip install -r requirements.txt
```

API Documentation

For detailed API documentation, see API Documentation.

Quick Endpoint Reference

GET / - API information
GET /health - API health check
POST /embedding - Get embedding for a single word
POST /embeddings - Get embeddings for multiple words
GET /similar/{word} - Get similar words

Development

Code Quality

Format your code:

black .
isort .

Run linting:

flake8 .

Testing

Run tests with coverage:

pytest tests/ -v --cov=app --cov-report=term-missing

Running the API

Local Development

uvicorn app:app --reload

Production

uvicorn app:app --host 0.0.0.0 --port 8080

API Endpoints

GET / - API information
POST /embedding - Get embedding for a single word
POST /embeddings - Get embeddings for multiple words
GET /similar/{word} - Get similar words
POST /tokenize - Tokenize text using GPT-3/4 tokenizers
GET /available-tokenizers - Get a list of available tokenizers

Deployment

Replit Deployment

This project is configured for deployment on Replit. Simply import the repository and click Run.

Docker Deployment

Build the Docker image:
```
docker build -t glove-api .
```
Run the container:
```
docker run -p 8080:8080 glove-api
```

CI/CD Pipeline

The project includes a GitHub Actions workflow that:

Runs on every push and pull request to main/master branches
Checks code formatting with Black
Validates imports with isort
Runs linting with Flake8
Executes the test suite
Deploys to Replit (on main/master)
Builds and pushes Docker image (on main/master)

Required secrets for deployment:

REPL_ID: Your Replit project ID
REPLIT_TOKEN: Your Replit authentication token
DOCKERHUB_USERNAME: Your Docker Hub username
DOCKERHUB_TOKEN: Your Docker Hub access token

Model Information

Pre-trained GloVe vectors based on Wikipedia and Gigaword dataset:

6B tokens
400K vocab
300d vectors
Uncased

Working with Large Files

This repository uses Git LFS to handle large files. The following file types are tracked by Git LFS:

.model - Model files
.npy - NumPy array files
.bin - Binary files
.vec - Vector files
.gz - Compressed files

When cloning the repository, make sure to have Git LFS installed and run git lfs pull to download the model files.

References

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github/workflows		.github/workflows
docs		docs
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
app.py		app.py
download_model.py		download_model.py
pyproject.toml		pyproject.toml
replit.nix		replit.nix
requirements.txt		requirements.txt
ruff.toml		ruff.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GloVe Word Embeddings API

Features

Prerequisites

Installation

API Documentation

Quick Endpoint Reference

Development

Code Quality

Testing

Running the API

Local Development

Production

API Endpoints

Deployment

Replit Deployment

Docker Deployment

CI/CD Pipeline

Model Information

Working with Large Files

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

neill-k/w2vapi

Folders and files

Latest commit

History

Repository files navigation

GloVe Word Embeddings API

Features

Prerequisites

Installation

API Documentation

Quick Endpoint Reference

Development

Code Quality

Testing

Running the API

Local Development

Production

API Endpoints

Deployment

Replit Deployment

Docker Deployment

CI/CD Pipeline

Model Information

Working with Large Files

References

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages