Python for Data Science - Assignment 2
Today I am going to share my second assignment of Python on E-Commerce Dataset.
While solving this assignment I faced lot of issues like some of the syntaxes were throwing error, some of the syntaxes I used for the first time so I took help from my Bard Infinity Coach, Google, Kaggle, Data World, W3School website.
Lets talk about my dataset which consist of E-Commerce data so i used different libraries of Python like 'Numpy', 'Pandas', 'Matplotlib', 'Seaborn' etc. This problem statement mostly covers different kind of charts.
First I loaded the dataset into Python for overviewing
I calculated Sales by multiplying Quantity and Unit Price
To perform EDA (Exploratory Data Analysis) I calculate outliers by using Boxplot
I used Histogram for all numerical variable
Then going forward i calculated Minimum Quantity, Maximum Quantity and Total Quantity by using aggregation and groupby function same for Sales also.
I extracted all the unique values across the dataset
I extracted all the duplicate values across the dataset
Correlation - Heatmap = All numeric variables
Regression plot = All numerical variables
Then I Added the columns - Month, Day and Hour for the invoice
TOP 5 customers with higher number of orders
How much money spent by the customers
TOP 5 customers with highest money spent
How many orders per month?
How many orders per day?
How many orders per hour?
How many orders for each country?
Thank You !!!
Good job, Abhishek. I would like to suggest couple of things for you to try, as it can help you to bring more insights during data exploration. During EDA do check the distribution of numerical features to check if it's normally distributed or skewed. And if it skewed or not following normal distribution what sort of transformation you can try. :)