Implemented Principal Component Analysis and auto-encoder on MNSIT handwritten digits and Faces In The Wild dataset to reduced the dimensions of the image, visualized it and reconstruct it back to original image using MATLAB
All the codes have been written in MATLAB and are compatible with the matlab_R2016a and higher versions. There are 2 main scripts in the code, one for each Dataset.
To run the MNSIT Dataset, run the script ‘MNSIT_Run.m’. It calculates the PCA MSE and PSNR errors for both the Training and Testing data. It also calculates the AutoEncoder MSE and PSNR Errors. The number of components is set to 50 by default which can be changed. KNN-Classification is also done on data generated by both PCA and the AutoEncoder for k=5 by default. Their accuracies are also calculated.
To run the Faces in the Wild dataset, run the script ‘faces_in_the_wild.m’. It procures the training data first and then computes the PCA and AutoEncoder MSE and PSNR Errors respectively for 1500 Components (default).
Separate modules for loading PCA data (PCA MSE and PSNR Error rates), training AutoEncoder, Training KNN Classifier and calculating the accuracy are used in the code to achieve modularity.
All the training data should be in the root folder as added in the zip file.
The running time of the code was in total approximately 120 to 150 minutes.
The MNSIT Dataset took approximately 30-45 minutes to run completely (5 minutes for PCA Operations (50 components), 25-30 minutes for Training the auto encoder for 50 hidden neurons and 100 iterations and 5 minutes to train and predict the KNN Classifier).
The Faces in the Wild Dataset took approximately 90-120 minutes to run completely. The majority of the time was spend on training the auto encoder for 1500 hidden neurons and for 50 iterations. The rest of the time is divided between reconstructing the data from Autoencoder and PCA alike and then calculating their MSE and PSNR Errors.