Comprehensive data platform demonstrating production-ready analytics, experimentation, and ML systems for modern e-commerce.
Flit is a (hypothetical) fast-growing e-commerce logistics company that needed to transform from gut-feeling decisions to data-driven growth. This platform demonstrates how modern data teams build end-to-end analytics infrastructure that scales from startup to enterprise.
Business Impact Delivered:
- π Data Foundation: Unified analytics reducing analyst query time by xx%
- π Smart Experimentation: A/B testing framework with CUPED variance reduction
- π€ Predictive Intelligence: LTV and churn models driving targeted campaigns
- π¬ AI-Powered Operations: RAG documentation assistant reducing support tickets
graph TB
subgraph "Data Sources"
B[E-commerce Transactions]
C[Synthetic Experiments]
end
subgraph "Data Platform"
D[BigQuery Warehouse]
E[dbt Transformations]
F[Cost Optimization]
end
subgraph "Analytics Layer"
G[Experiment Framework]
H[ML Models & APIs]
I[AI Documentation Assistant]
end
subgraph "User Interfaces"
J[Executive Dashboard]
K[Experiment Results]
L[Model Monitoring]
end
B --> D
C --> D
D --> E
E --> F
E --> G
E --> H
E --> I
G --> K
H --> L
I --> J
K --> J
L --> J
| Repository | Purpose | Tech Stack | Status |
|---|---|---|---|
| flit-data-platform | Core data warehouse & transformations | dbt, BigQuery, SQL | π |
| flit-experiments | A/B testing & advanced experimentation | Python, Streamlit, Statistics | π |
| flit-ml-api | ML models, APIs & MLOps | FastAPI, XGBoost, Docker, GCP | β³ |
| flit-ai-assistant | RAG-powered documentation bot | LangChain, ChromaDB, OpenAI/LLama | β³ |
- GCP Account with BigQuery API enabled
- Python 3.9+ with pip
- Docker for containerized deployment
- Git with SSH keys configured
# Main orchestration repo
git clone https://github.com/whitehackr/flit-main.git
cd flit-main
# Initialize all submodules
git submodule add https://github.com/whitehackr/flit-data-platform.git data-platform
git submodule add https://github.com/whitehackr/flit-experiments.git experiments
git submodule add https://github.com/whitehackr/flit-ml-api.git ml-api
git submodule add https://github.com/whitehackr/flit-ai-assistant.git ai-assistant
# Pull all submodules
git submodule update --init --recursive# Copy environment template
cp .env.example .env
# Edit with your GCP credentials
nano .envRequired Environment Variables:
# GCP Configuration
GOOGLE_CLOUD_PROJECT=your-project-id
GOOGLE_APPLICATION_CREDENTIALS=path/to/service-account.json
# BigQuery Datasets
FLIT_RAW_DATASET=flit_raw
FLIT_STAGING_DATASET=flit_staging
FLIT_MARTS_DATASET=flit_marts
# API Configuration
ML_API_BASE_URL=https://your-api-url.run.app
OPENAI_API_KEY=your-openai-key
# Streamlit Configuration
STREAMLIT_SHARING_MODE=true# Deploy entire platform with Docker Compose
docker-compose up -d
# Or deploy components individually
make deploy-data-platform
make deploy-experiments
make deploy-ml-api
make deploy-ai-assistant# Run initial data ingestion
cd data-platform && dbt run --full-refresh
# Generate sample experiments
cd experiments && python generate_sample_data.py
# Train initial ML models
cd ml-api && python train_models.py- π Main Dashboard: http://localhost:8501
- π§ͺ Experiment Results: http://localhost:8502
- π€ ML API Docs: http://localhost:8000/docs
- π¬ AI Assistant: http://localhost:8503
Repository: flit-data-platform
- Raw Data Ingestion: e-commerce transactions, synthetic experiments
- Transformation Pipeline: Staging β Intermediate β Marts architecture
- Cost Optimization: CC% reduction in BigQuery spend through query optimization
- Data Quality: Comprehensive testing and monitoring with Great Expectations
Key Deliverables:
- Unified customer 360Β° view
- Real-time experiment exposure tracking
- ML-ready data
- Executive KPI dashboards
Repository: flit-experiments
- Classical A/B Testing: Hypothesis testing with power analysis
- CUPED Implementation: Variance reduction using pre-experiment data
- Advanced Methods: Multi-armed bandits and sequential testing
- Interactive Apps: Streamlit dashboards for live experiment analysis
Key Deliverables:
- XX% reduction in required sample sizes with CUPED
- Bayesian stopping rules preventing p-hacking
- Thompson Sampling for multi-variant optimization
- Comprehensive statistical test suite
Repository: flit-ml-api
- Predictive Models: Customer LTV and churn prediction
- Production APIs: FastAPI endpoints with automatic documentation
- MLOps Pipeline: CI/CD with GitHub Actions and Cloud Run deployment
- Model Monitoring: Drift detection and performance tracking
Key Deliverables:
- YY% accuracy in churn prediction
- LTV model driving $ZM+ in targeted campaigns
- Automated retraining pipeline
- SHAP explainability for model interpretability
Repository: flit-ai-assistant
- RAG Architecture: Retrieval-augmented generation for internal docs
- Vector Database: Semantic search with ChromaDB
- Chat Interface: Natural language querying of company knowledge
- Integration: Embedded in main platform dashboard
Key Deliverables:
- BB% reduction in internal support tickets
- Instant access to technical documentation
- Context-aware responses with source citations
- Scalable knowledge base ingestion
# Setup development environment
make setup-dev
# Run tests across all components
make test-all
# Lint and format code
make lint-fix
# Start development servers
make dev-start# dbt workflow
cd data-platform
dbt deps