NOTE: This project is still unfinished and will continue to expand over the coming months
This project examines a number of different machine learning models to examine their performance on the EMBER dataset, and particularly the vectorized features of the EMBER dataset. To use the project, look at the malware.ipynb file and begin walking through it. Some of the results of training are also there to help compare the quality of each model. A few included models are:
-
LightGBM (Baseline EMBER model)
-
CatBoost
-
XGBoost
-
Neural Network
As well as model architectures taken from:
-
Puranik, Piyush Aniruddha, "Static Malware Detection using Deep Neural Networks on Portable Executables" (2019). UNLV Theses, Dissertations, Professional Papers, and Capstones. 3744. http://dx.doi.org/10.34917/16076285
-
Lad, Sumit & Adamuthe, Amol. (2022). Improved Deep Learning Model for Static PE Files Malware Detection and Classification. International Journal of Computer Network and Information Security. 14. 14-26. 10.5815/ijcnis.2022.02.02.