A production-ready MLOps platform for sales forecasting that demonstrates modern machine learning engineering practices. Built on Astronomer (Apache Airflow), this project implements an end-to-end ML pipeline with ensemble modeling, comprehensive visualization, and real-time inference capabilities via Streamlit.
- Automated ML Pipeline: End-to-end orchestration with Astronomer/Airflow
- Ensemble Modeling: Combines XGBoost, LightGBM, and Prophet for robust predictions
- Advanced Visualizations: Comprehensive model performance analysis and comparison
- Real-time Inference: Streamlit-based web UI for interactive predictions
- Experiment Tracking: MLflow integration for model versioning and metrics
- Distributed Storage: MinIO S3-compatible object storage for artifacts
- Containerized Deployment: Docker-based architecture for consistency
| Component | Technology | Purpose |
|---|---|---|
| Orchestration | Astronomer (Airflow 3.0+) | Workflow automation and scheduling |
| ML Tracking | MLflow 2.9+ | Experiment tracking and model registry |
| Storage | MinIO | S3-compatible artifact storage |
| ML Models | XGBoost, LightGBM, Prophet | Ensemble forecasting |
| Visualization | Matplotlib, Seaborn, Plotly | Model analysis and insights |
| Inference UI | Streamlit | Interactive prediction interface |
| Containerization | Docker & Docker Compose | Environment consistency |
- Docker Desktop installed and running
- Astronomer CLI (
brew install astroon macOS, other OS, you can follow the instructions here) - 8GB+ RAM available for Docker
- Ports 8080, 8501, 5001, 9000, 9001 available
# Clone the repository
git clone https://github.com/airscholar/astro-salesforecast.git
cd Astro-SalesForecast# Start Astronomer Airflow services
astro dev startThis will start:
- Airflow UI: http://localhost:8080 (admin/admin)
- Streamlit UI: http://localhost:8501
- MLflow UI: http://localhost:5001
- MinIO Console: http://localhost:9001 (minioadmin/minioadmin)
- Open Airflow UI at http://localhost:8080
- Enable the
sales_forecast_trainingDAG - Trigger the DAG manually or wait for scheduled run
- Monitor progress in the Airflow UI
- Open Streamlit at http://localhost:8501
- Click "Load/Reload Models" in the sidebar
- Choose input method (upload CSV, manual entry, or sample data)
- Configure forecast parameters
- Generate predictions and export results
- Synthetic data generation with realistic patterns
- Time-based train/validation/test splitting
- Comprehensive data validation and quality checks
- Advanced feature engineering (lags, rolling stats, seasonality)
- XGBoost: Gradient boosting for non-linear patterns
- LightGBM: Fast training with categorical support
- Ensemble: Optimized weighted average of all models
- Hyperparameter tuning with Optuna
- Model performance comparison charts
- Time series predictions with confidence intervals
- Residual analysis and diagnostics
- Feature importance rankings
- Interactive plots with Plotly
- Automated experiment tracking with MLflow
- Model versioning and registry
- Artifact storage in MinIO
- Production model promotion workflow
- Multiple Input Methods: CSV upload, manual entry, sample data
- Model Selection: Individual models or ensemble
- Interactive Visualizations: Real-time prediction plots
- Confidence Intervals: 95% prediction bounds
- Export Capabilities: Download predictions as CSV
# Simplified prediction flow
Input Data β Feature Engineering β Model Prediction β Visualization β Export- Training Time: ~2-5 minutes for full pipeline
- Prediction Latency: <100ms per forecast
- Model Accuracy: MAPE < 5% on test data
- Ensemble Performance: 15-20% improvement over individual models
- Services not starting: Check Docker memory allocation (8GB minimum)
- Models not loading: Ensure training DAG has completed successfully
- Port conflicts: Stop conflicting services or modify ports in docker-compose
- MLflow connection: Verify MLflow service is running and accessible
# Check Airflow logs
astro dev logs
# Check specific service logs
docker-compose -f docker-compose.override.yml logs mlflow
docker-compose -f docker-compose.override.yml logs streamlit- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.