Skip to content

gautam-coder/Grammar_Correction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Advanced Grammar Correction Suite

An end-to-end grammar and punctuation correction system built with a FastAPI backend and a Streamlit frontend. The service combines rule-based checks, multiple Hugging Face generative models, and structure-aware text processing to deliver high-quality corrections for everything from quick sentences to long, formatted documents.


Key Features

  • FastAPI Backend

    • LanguageTool integration with intelligent chunking to keep corrections fast and consistent.
    • Multiple transformer models (prithivida/grammar_error_correcter_v1, vennify/t5-base-grammar-correction, pszemraj/flan-t5-large-grammar-synthesis, grammarly/coedit-large) loaded with GPU support when available.
    • Dynamic model selection (auto, lightweight, best) plus a text2text pipeline fallback for premium quality.
    • Text structure analysis (word/sentence/paragraph counts, formatting detection) to adapt processing strategy, batching, and prompts automatically.
    • Inline diff highlighting and similarity scoring to explain each correction.
    • Health and model metadata endpoints for observability.
  • Streamlit Frontend

    • Rich UI with live metrics, sample texts, and clipboard helpers.
    • Sidebar controls for model preferences, advanced options, and API health checks.
    • Progress indicators, error handling, and download options (plain text, report, JSON).
    • Designed to support both quick edits and multi-paragraph document reviews.
  • Deployment Ready

    • Dockerfile configured with Java (for LanguageTool) and caching for large Hugging Face models.
    • Environment variables for Hugging Face caching (HF_HOME, TRANSFORMERS_CACHE) pre-set in the container.

Project Structure

Grammar/
├── grammer_backend.py     # FastAPI application with correction pipeline
├── grammer_frontend.py    # Streamlit UI for interacting with the API
├── grammerly.py           # Extended experimental backend (not used by default)
├── requirements.txt       # Python dependencies
├── Dockerfile             # Container definition (serves FastAPI by default)

Note: grammerly.py contains an advanced, experimental service with websockets and background tasks. The primary production entrypoints are grammer_backend.py (API) and grammer_frontend.py (UI).


Getting Started

Prerequisites

  • Python 3.12+
  • (Optional) CUDA-capable GPU for faster inference
  • Java runtime (installed automatically in Docker; required by LanguageTool when running natively)

1. Clone and set up a virtual environment

git clone <your-repo-url>
cd Grammar
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install --upgrade pip
pip install -r requirements.txt

2. Run the FastAPI backend

uvicorn grammer_backend:app --host 0.0.0.0 --port 8000 --reload

The server exposes:

  • POST /correct/ – main correction endpoint
  • GET /models/ – model status (loaded models, device, pipeline availability)
  • GET /health/ – quick health check

On first run, large transformer weights may download from Hugging Face. Ensure you have sufficient disk space and a stable connection.

3. Launch the Streamlit frontend (in a second terminal)

streamlit run grammer_frontend.py

By default, the UI expects the API at http://localhost:8000. Adjust API_BASE_URL in grammer_frontend.py if you deploy elsewhere.


API Reference

POST /correct/

Request body

{
  "text": "Your input text here.",
  "use_best_model": true,
  "model_preference": "auto",      // "auto" | "lightweight" | "best"
  "preserve_formatting": true,
  "processing_mode": "intelligent" // currently accepted but reserved for future use
}

Response outline

{
  "original": "...",
  "text_analysis": {
    "word_count": 123,
    "sentence_count": 10,
    "paragraph_count": 3,
    "structure_type": "multi_paragraph",
    "needs_segmentation": true,
    "avg_paragraph_length": 41.0
  },
  "results": {
    "language_tool": {
      "corrected": "...",
      "highlighted": "<span>...</span>",
      "similarity": 0.94
    },
    "best_pipeline": {
      "corrected": "...",
      "highlighted": "...",
      "similarity": 0.97
    },
    "huggingface": {
      "pszemraj/flan-t5-large-grammar-synthesis": {
        "corrected": "...",
        "highlighted": "...",
        "similarity": 0.98
      }
    }
  },
  "final_best": "Best end-to-end correction",
  "models_used": [
    "prithivida/grammar_error_correcter_v1",
    "vennify/t5-base-grammar-correction",
    "pszemraj/flan-t5-large-grammar-synthesis",
    "grammarly/coedit-large"
  ],
  "device": "cuda",
  "processing_strategy": {
    "chunks_used": true,
    "structure_type": "multi_paragraph",
    "batch_size_used": 2
  }
}

GET /models/

Returns a list of loaded models, device information, and whether the premium pipeline is available.

GET /health/

Simple status payload with number of loaded models and current device.


Frontend Walkthrough

  1. Settings Sidebar

    • Toggle premium models, choose quality vs. speed (auto, lightweight, best).
    • Optional similarity scores and comparison toggles.
    • API health check button to retrieve backend status.
  2. Text Input Panel

    • Sample snippets for quick testing.
  • Word/character/sentence counters update live.
  • Clear text and clipboard helper buttons.
  1. Correction Flow
    • Visual progress bar and status message while waiting on the API.
    • Metrics showing processing time, model count, and device used.
    • Expandable cards for original text, best correction, LanguageTool results, best pipeline, and per-model outputs.
    • Download buttons for plain text, a formatted report, or the raw JSON.

Docker Usage

Build and run the FastAPI service in a container:

docker build -t grammar-api .
docker run -p 8000:8000 grammar-api
  • The Dockerfile installs Java for LanguageTool and sets Hugging Face cache directories to /app/huggingface.
  • To include the Streamlit UI in a container, add another stage or run Streamlit separately against the containerized API.

Troubleshooting

  • Large model downloads fail: configure Hugging Face authentication or predownload models into HF_HOME/TRANSFORMERS_CACHE.
  • LanguageTool errors: ensure Java is available. In constrained environments, the code falls back to LanguageToolPublicAPI, but rate limits may apply.
  • API timeouts: long documents can trigger the default 5-minute timeout on the frontend. Increase the Streamlit request timeout or process in smaller chunks.
  • GPU not detected: verify CUDA drivers; the backend will run on CPU if GPU is unavailable, but processing slows down.

Contributing

  1. Fork the repository and create a new branch.
  2. Run formatting and linting tools you adopt (e.g., ruff, black, mypy) before sending a PR.
  3. Document changes clearly—especially when adding models or altering the correction workflow.
  4. Add or update tests/notebooks where relevant.

Next Steps

  • Add automated tests (unit + integration) for the chunking logic and API endpoints.
  • Expose authentication or rate limiting if deploying publicly.
  • Package the frontend with the backend for a single deployment target (e.g., Docker Compose).
  • Document Hugging Face model licensing considerations before commercial use.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages