Skip to content

End-to-end data platform showcasing modern analytics stack: BigQuery warehouse, CUPED A/B testing, FastAPI ML models, and RAG assistant. Integrated microservices architecture with live demos. Built for scale with cost optimization and statistical rigor.

Notifications You must be signed in to change notification settings

whitehackr/flit-main

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

Flit: End-to-End Analytics Platform πŸš€

Comprehensive data platform demonstrating production-ready analytics, experimentation, and ML systems for modern e-commerce.

Data Platform Experiments ML Platform AI Assistant

🏒 Business Context

Flit is a (hypothetical) fast-growing e-commerce logistics company that needed to transform from gut-feeling decisions to data-driven growth. This platform demonstrates how modern data teams build end-to-end analytics infrastructure that scales from startup to enterprise.

Business Impact Delivered:

  • πŸ” Data Foundation: Unified analytics reducing analyst query time by xx%
  • πŸ“Š Smart Experimentation: A/B testing framework with CUPED variance reduction
  • πŸ€– Predictive Intelligence: LTV and churn models driving targeted campaigns
  • πŸ’¬ AI-Powered Operations: RAG documentation assistant reducing support tickets

πŸ—οΈ Platform Architecture

graph TB
    subgraph "Data Sources"
        B[E-commerce Transactions]
        C[Synthetic Experiments]
    end
    
    subgraph "Data Platform"
        D[BigQuery Warehouse]
        E[dbt Transformations] 
        F[Cost Optimization]
    end
    
    subgraph "Analytics Layer"
        G[Experiment Framework]
        H[ML Models & APIs]
        I[AI Documentation Assistant]
    end
    
    subgraph "User Interfaces"
        J[Executive Dashboard]
        K[Experiment Results]
        L[Model Monitoring]
    end


    B --> D  
    C --> D
    D --> E
    E --> F
    E --> G
    E --> H
    E --> I
    G --> K
    H --> L
    I --> J
    K --> J
    L --> J
Loading

πŸ—‚οΈ Repository Structure

Repository Purpose Tech Stack Status
flit-data-platform Core data warehouse & transformations dbt, BigQuery, SQL πŸ”„
flit-experiments A/B testing & advanced experimentation Python, Streamlit, Statistics πŸ”„
flit-ml-api ML models, APIs & MLOps FastAPI, XGBoost, Docker, GCP ⏳
flit-ai-assistant RAG-powered documentation bot LangChain, ChromaDB, OpenAI/LLama ⏳

πŸš€ Quick Start

Prerequisites

  • GCP Account with BigQuery API enabled
  • Python 3.9+ with pip
  • Docker for containerized deployment
  • Git with SSH keys configured

1. Clone All Repositories

# Main orchestration repo
git clone https://github.com/whitehackr/flit-main.git
cd flit-main

# Initialize all submodules
git submodule add https://github.com/whitehackr/flit-data-platform.git data-platform
git submodule add https://github.com/whitehackr/flit-experiments.git experiments  
git submodule add https://github.com/whitehackr/flit-ml-api.git ml-api
git submodule add https://github.com/whitehackr/flit-ai-assistant.git ai-assistant

# Pull all submodules
git submodule update --init --recursive

2. Environment Setup

# Copy environment template
cp .env.example .env

# Edit with your GCP credentials
nano .env

Required Environment Variables:

# GCP Configuration
GOOGLE_CLOUD_PROJECT=your-project-id
GOOGLE_APPLICATION_CREDENTIALS=path/to/service-account.json

# BigQuery Datasets  
FLIT_RAW_DATASET=flit_raw
FLIT_STAGING_DATASET=flit_staging
FLIT_MARTS_DATASET=flit_marts

# API Configuration
ML_API_BASE_URL=https://your-api-url.run.app
OPENAI_API_KEY=your-openai-key

# Streamlit Configuration
STREAMLIT_SHARING_MODE=true

3. Full Platform Deployment

# Deploy entire platform with Docker Compose
docker-compose up -d

# Or deploy components individually
make deploy-data-platform
make deploy-experiments  
make deploy-ml-api
make deploy-ai-assistant

4. Initialize Data Pipeline

# Run initial data ingestion
cd data-platform && dbt run --full-refresh

# Generate sample experiments
cd experiments && python generate_sample_data.py

# Train initial ML models
cd ml-api && python train_models.py

5. Access Live Demos

πŸ“Š Platform Components

Data Platform (dbt + BigQuery)

Repository: flit-data-platform

  • Raw Data Ingestion: e-commerce transactions, synthetic experiments
  • Transformation Pipeline: Staging β†’ Intermediate β†’ Marts architecture
  • Cost Optimization: CC% reduction in BigQuery spend through query optimization
  • Data Quality: Comprehensive testing and monitoring with Great Expectations

Key Deliverables:

  • Unified customer 360Β° view
  • Real-time experiment exposure tracking
  • ML-ready data
  • Executive KPI dashboards

Experimentation Framework

Repository: flit-experiments

  • Classical A/B Testing: Hypothesis testing with power analysis
  • CUPED Implementation: Variance reduction using pre-experiment data
  • Advanced Methods: Multi-armed bandits and sequential testing
  • Interactive Apps: Streamlit dashboards for live experiment analysis

Key Deliverables:

  • XX% reduction in required sample sizes with CUPED
  • Bayesian stopping rules preventing p-hacking
  • Thompson Sampling for multi-variant optimization
  • Comprehensive statistical test suite

ML Platform & MLOps

Repository: flit-ml-api

  • Predictive Models: Customer LTV and churn prediction
  • Production APIs: FastAPI endpoints with automatic documentation
  • MLOps Pipeline: CI/CD with GitHub Actions and Cloud Run deployment
  • Model Monitoring: Drift detection and performance tracking

Key Deliverables:

  • YY% accuracy in churn prediction
  • LTV model driving $ZM+ in targeted campaigns
  • Automated retraining pipeline
  • SHAP explainability for model interpretability

AI Documentation Assistant

Repository: flit-ai-assistant

  • RAG Architecture: Retrieval-augmented generation for internal docs
  • Vector Database: Semantic search with ChromaDB
  • Chat Interface: Natural language querying of company knowledge
  • Integration: Embedded in main platform dashboard

Key Deliverables:

  • BB% reduction in internal support tickets
  • Instant access to technical documentation
  • Context-aware responses with source citations
  • Scalable knowledge base ingestion

πŸ”§ Development Workflow

Local Development

# Setup development environment
make setup-dev

# Run tests across all components
make test-all

# Lint and format code
make lint-fix

# Start development servers
make dev-start

Data Pipeline Management

# dbt workflow
cd data-platform
dbt deps

About

End-to-end data platform showcasing modern analytics stack: BigQuery warehouse, CUPED A/B testing, FastAPI ML models, and RAG assistant. Integrated microservices architecture with live demos. Built for scale with cost optimization and statistical rigor.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published