This project addresses the problem of class imbalance in classification tasks using Python-based machine learning techniques. Real-world examples include fraud detection, rare disease prediction, and more.
- Data visualization and analysis
- Handling imbalance using:
- Oversampling (Random, SMOTE, ADASYN)
- Undersampling
- Model training using:
- Random Forest
- XGBoost
- Evaluation with metrics suited for imbalance:
- F1-score, ROC-AUC, Confusion Matrix
- Python 3.8+
- pandas, numpy, matplotlib, seaborn
- scikit-learn, imbalanced-learn
- xgboost
notebooks/: Jupyter Notebook implementationsrc/: Python scripts for data handling and modelingoutputs/: Generated charts and resultsreport/: Final PDF project report
git clone https://github.com/your-username/classification-imbalanced-data-ml.git
cd classification-imbalanced-data-ml
pip install -r requirements.txt