Skip to content

Ajs2207/MLOPS-Capstone-Project

Repository files navigation

πŸš€ MLOps Capstone Project

MLOps Python Flask Docker Kubernetes AWS

An End-to-End Machine Learning Operations Pipeline with Automated CI/CD, Container Orchestration, and Real-Time Monitoring


πŸ“‹ Table of Contents


✨ Overview

This is a production-grade MLOps pipeline that demonstrates the complete lifecycle of machine learning models - from data ingestion and experimentation to deployment and monitoring. This project showcases industry best practices for automating ML workflows, containerizing applications, orchestrating deployments on Kubernetes, and implementing comprehensive observability.

Key Highlights:

  • πŸ”„ End-to-End ML Pipeline: From data ingestion to production-ready API
  • ☁️ Cloud-Native Architecture: Leveraging AWS EKS, S3, and ECR
  • πŸ€– Automated CI/CD: GitHub Actions for seamless deployment
  • πŸ“ˆ Production-Ready: Scalable, maintainable, and monitored
  • 🐳 Container Orchestration: Docker + Kubernetes on EKS
  • πŸ“Š Real-Time Monitoring: Prometheus & Grafana integration

πŸ—οΈ Architecture

flowchart TB
    subgraph CI/CD["CI/CD Pipeline"]
        A[GitHub Push] --> B[GitHub Actions]
        B --> C[Run Tests]
        C --> D[Build Docker]
        D --> E[Push to ECR]
        E --> F[Deploy to EKS]
    end
    
    subgraph Data["Data & Experiment Tracking"]
        G[DVC] --> H[S3 Bucket]
        I[MLFlow] --> J[Dagshub]
    end
    
    subgraph K8s["Kubernetes Cluster EKS"]
        K[Flask App Pod] --> L[LoadBalancer Service]
        L --> M[External Traffic]
    end
    
    subgraph Monitor["Monitoring Stack"]
        N[Prometheus] --> O[Grafana]
        O --> P[Dashboards]
    end
    
    F --> K
    K --> N
Loading

πŸ› οΈ Tech Stack

Backend & ML

  • Python Badge Python 3.10 - Core programming language
  • Flask Badge Flask API - Web framework for model serving
  • Pandas Badge Pandas & NumPy - Data manipulation and computation
  • Scikit-learn Badge Scikit-learn - Machine learning algorithms

Data Version Control & Experiment Tracking

  • DVC Badge DVC for Data Version Control - Track datasets and models
  • MLFlow Badge MLFlow Experiment Tracking - Experiment logging and comparison
  • Dagshub Badge Dagshub Remote Tracking - Remote experiment repository

Containerization & Orchestration

  • Docker Badge Docker Containerization - Container images and deployment
  • Kubernetes Badge Kubernetes (EKS) - Container orchestration platform
  • eksctl Badge eksctl Cluster Management - AWS EKS cluster provisioning
  • kubectl Badge kubectl CLI - Kubernetes command-line tool

Cloud Services (AWS)

  • AWS EKS Badge Amazon EKS - Fully managed Kubernetes service
  • AWS S3 Badge AWS S3 Storage - Scalable object storage
  • AWS ECR Badge Amazon ECR - Private Docker image registry
  • AWS IAM Badge AWS IAM Security - Identity and access management

CI/CD & Automation

  • GitHub Actions Badge GitHub Actions CI/CD - Automated workflows and testing
  • GitHub Secrets Badge Secrets Management - Secure credential handling

Monitoring & Observability

  • Prometheus Badge Prometheus Metrics Collection - Time-series metrics database
  • Grafana Badge Grafana Dashboards - Visualization and monitoring
  • Custom Logging Framework - Application-level logging
  • Real-time Application Monitoring - Performance tracking and alerting

πŸš€ Features

πŸ“Š Data Management

  • Data Version Control with DVC - Track datasets and models versions
  • Automated Data Pipelines - DVC repro for reproducible workflows
  • S3 Integration - Scalable remote storage of data artifacts

πŸ§ͺ Experiment Tracking

  • MLFlow Integration with Dagshub - Experiment logging and comparison
  • Model Registry - Versioning and promoting models to production
  • Parameter Management - Centralized params.yaml configuration

πŸ”„ CI/CD Pipeline

  • Automated Testing - Comprehensive tests on every push using GitHub Actions
  • Docker Image Building - Automatic building and pushing to AWS ECR
  • Zero-Downtime Deployment - Seamless deployment to EKS cluster
  • Environment Secrets Management - Secure handling with GitHub Secrets

☁️ Cloud Infrastructure

  • AWS EKS Cluster - Fully managed containerization and orchestration
  • AWS S3 Storage - Persistent data storage with high availability
  • AWS ECR - Private Docker image registry for secure image management
  • LoadBalancer Service - External traffic routing with auto-scaling

πŸ“ˆ Monitoring & Observability

  • Prometheus Integration - Metrics collection and scraping from applications
  • Grafana Dashboards - Real-time visualization and monitoring
  • Custom Metrics - Application-level metrics from Flask
  • Alerting Capabilities - Proactive monitoring and notifications

πŸ“ Project Structure

MLOPS-Capstone-Project/
β”‚
β”œβ”€β”€ .github/workflows/
β”‚   └── ci.yaml                 # CI/CD pipeline definition
β”‚
β”œβ”€β”€ flask_app/
β”‚   β”œβ”€β”€ app.py                  # Flask application entry point
β”‚   β”œβ”€β”€ requirements.txt        # Python dependencies
β”‚   └── Dockerfile              # Container configuration
β”‚
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ logger.py               # Logging configuration
β”‚   β”œβ”€β”€ data_ingestion.py       # Data loading module
β”‚   β”œβ”€β”€ data_preprocessing.py   # Data cleaning & transformation
β”‚   β”œβ”€β”€ feature_engineering.py  # Feature creation module
β”‚   β”œβ”€β”€ model_building.py       # Model training script
β”‚   β”œβ”€β”€ model_evaluation.py     # Model performance metrics
β”‚   └── register_model.py       # Model version registration
β”‚
β”œβ”€β”€ tests/                      # Unit & integration tests
β”œβ”€β”€ scripts/                    # Utility scripts
β”œβ”€β”€ models/                     # Saved model artifacts
β”‚
β”œβ”€β”€ dvc.yaml                    # DVC pipeline definition
β”œβ”€β”€ params.yaml                 # Hyperparameters configuration
β”œβ”€β”€ requirements.txt            # Core dependencies
β”œβ”€β”€ Dockerfile                  # Root Docker configuration
β”œβ”€β”€ deployment.yaml             # Kubernetes deployment manifest
β”‚
└── README.md                   # Project documentation

βš™οΈ Installation & Setup

Prerequisites

  • Python 3.10+ - Programming language runtime
  • Docker Desktop - Container environment for local development
  • kubectl - Kubernetes command-line interface
  • eksctl - AWS EKS cluster management tool
  • AWS CLI - Amazon Web Services command-line interface
  • Git - Version control system
  • Conda/Miniconda - Python environment manager

Local Development Setup

# Clone repository
git clone https://github.com/Ajs2207/MLOPS-Capstone-Project.git
cd MLOPS-Capstone-Project

# Create virtual environment
conda create -n atlas python=3.10
conda activate atlas

# Install dependencies
pip install -r requirements.txt

# Run locally
cd flask_app
python app.py

DVC Pipeline Execution

# Initialize DVC
dvc init

# Configure remote storage
dvc remote add -d myremote s3://your-bucket-name

# Run pipeline
dvc repro

# Check status
dvc status

Docker Build & Run

# Build image
docker build -t capstone-app:latest .

# Run container
docker run -p 8888:5000 -e CAPSTONE_TEST=<your-token> capstone-app:latest

πŸ”§ Pipeline Components

Component Description Tools
Data Ingestion Fetches data from source Python, Pandas
Data Preprocessing Cleaning & data transformation Pandas, NumPy
Feature Engineering Creates features for modeling Scikit-learn
Model Building Training & hyperparameter tuning Scikit-learn
Model Evaluation Performance metrics calculation Scikit-learn, MLFlow
Model Registration Version control & storage MLFlow, DVC

🌐 Deployment

CI/CD Pipeline Stages

Stage Tools Purpose
Code Commit Git, GitHub Version control & collaboration
Continuous Integration GitHub Actions, pytest Automated testing & validation
Container Build Docker, Dockerfile Image creation & optimization
Registry Push AWS ECR, AWS CLI Private image storage
Continuous Deployment kubectl, eksctl Kubernetes deployment
Health Check curl, custom scripts Deployment verification

Deploy to EKS

# Create EKS cluster
eksctl create cluster --name flask-app-cluster --region us-east-1 \
  --nodegroup-name flask-app-nodes --node-type t3.small --nodes 1

# Update kubeconfig
aws eks --region us-east-1 update-kubeconfig --name flask-app-cluster

# Deploy application
kubectl apply -f deployment.yaml

# Get LoadBalancer URL
kubectl get svc flask-app-service

# Test the deployment
curl http://<external-ip>:5000

Sample Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: flask-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: flask-app
  template:
    metadata:
      labels:
        app: flask-app
    spec:
      containers:
      - name: flask-app
        image: <aws-account-id>.dkr.ecr.us-east-1.amazonaws.com/capstone-proj:latest
        ports:
        - containerPort: 5000
---
apiVersion: v1
kind: Service
metadata:
  name: flask-app-service
spec:
  type: LoadBalancer
  ports:
    - port: 5000
      targetPort: 5000
  selector:
    app: flask-app

πŸ“Š Monitoring & Observability

Prometheus Configuration

# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: "flask-app"
    static_configs:
      - targets: ["<load-balancer-dns>:5000"]

Metrics Tracked

Metric Type Description
Request Latency Time taken to process requests
Request Throughput Number of requests per second
Model Prediction Time Inference latency
Error Rates Percentage of failed requests
System Resources CPU & Memory utilization

Grafana Setup

  • Launch EC2 instance for Grafana (t3.medium, 20GB SSD)
  • Configure security group to allow port 3000
  • Install Grafana and add Prometheus as data source
  • Create custom dashboards for visualization and monitoring

🀝 Contributing

  • Fork the repository
  • Create a feature branch - git checkout -b feature/amazing-feature
  • Commit your changes - git commit -m 'Add amazing feature'
  • Push to the branch - git push origin feature/amazing-feature
  • Open a Pull Request

πŸ“ž Contact

For questions, feedback, or collaboration opportunities:


πŸ“ License

This project is for demonstration purposes as part of an MLOps portfolio.

⭐ Star this repo if you find it useful!

Built with ❀️ for Production ML Systems

```

About

An End-to-End Machine Learning Operations Pipeline with Automated CI/CD, Container Orchestration, and Real-Time Monitoring

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors