Skip to content

neill-k/w2vapi

Repository files navigation

tags
glove
gensim
fse

GloVe Word Embeddings API

A FastAPI-based REST API for serving GloVe word embeddings based on Wikipedia and Gigaword dataset.

CI/CD Pipeline

Features

  • Get word embeddings for single words
  • Batch process multiple words
  • Find similar words based on embedding similarity
  • 300-dimensional word vectors from Hugging Face model hub
  • FastAPI-powered with automatic Swagger documentation
  • Automated CI/CD pipeline with GitHub Actions
  • Docker support
  • Code quality checks (Black, isort, Flake8)
  • Comprehensive test suite

Prerequisites

No special prerequisites required. The model will be automatically downloaded from Hugging Face model hub on first startup.

Installation

  1. Clone the repository:

    git clone https://your-repository-url.git
    cd your-repository-name
  2. Install dependencies:

    pip install -r requirements.txt

API Documentation

For detailed API documentation, see API Documentation.

Quick Endpoint Reference

  • GET / - API information
  • GET /health - API health check
  • POST /embedding - Get embedding for a single word
  • POST /embeddings - Get embeddings for multiple words
  • GET /similar/{word} - Get similar words

Development

Code Quality

Format your code:

black .
isort .

Run linting:

flake8 .

Testing

Run tests with coverage:

pytest tests/ -v --cov=app --cov-report=term-missing

Running the API

Local Development

uvicorn app:app --reload

Production

uvicorn app:app --host 0.0.0.0 --port 8080

API Endpoints

  • GET / - API information
  • POST /embedding - Get embedding for a single word
  • POST /embeddings - Get embeddings for multiple words
  • GET /similar/{word} - Get similar words
  • POST /tokenize - Tokenize text using GPT-3/4 tokenizers
  • GET /available-tokenizers - Get a list of available tokenizers

Deployment

Replit Deployment

This project is configured for deployment on Replit. Simply import the repository and click Run.

Docker Deployment

  1. Build the Docker image:

    docker build -t glove-api .
  2. Run the container:

    docker run -p 8080:8080 glove-api

CI/CD Pipeline

The project includes a GitHub Actions workflow that:

  1. Runs on every push and pull request to main/master branches
  2. Checks code formatting with Black
  3. Validates imports with isort
  4. Runs linting with Flake8
  5. Executes the test suite
  6. Deploys to Replit (on main/master)
  7. Builds and pushes Docker image (on main/master)

Required secrets for deployment:

  • REPL_ID: Your Replit project ID
  • REPLIT_TOKEN: Your Replit authentication token
  • DOCKERHUB_USERNAME: Your Docker Hub username
  • DOCKERHUB_TOKEN: Your Docker Hub access token

Model Information

Pre-trained GloVe vectors based on Wikipedia and Gigaword dataset:

  • 6B tokens
  • 400K vocab
  • 300d vectors
  • Uncased

Working with Large Files

This repository uses Git LFS to handle large files. The following file types are tracked by Git LFS:

  • .model - Model files
  • .npy - NumPy array files
  • .bin - Binary files
  • .vec - Vector files
  • .gz - Compressed files

When cloning the repository, make sure to have Git LFS installed and run git lfs pull to download the model files.

References

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published