Python for Data Science - Assignment 2

Abhishek Kad

Published Dec 9, 2021

Today I am going to share my second assignment of Python on E-Commerce Dataset.

While solving this assignment I faced lot of issues like some of the syntaxes were throwing error, some of the syntaxes I used for the first time so I took help from my Bard Infinity Coach, Google, Kaggle, Data World, W3School website.

Lets talk about my dataset which consist of E-Commerce data so i used different libraries of Python like 'Numpy', 'Pandas', 'Matplotlib', 'Seaborn' etc. This problem statement mostly covers different kind of charts.

First I loaded the dataset into Python for overviewing

I calculated Sales by multiplying Quantity and Unit Price

To perform EDA (Exploratory Data Analysis) I calculate outliers by using Boxplot

I used Histogram for all numerical variable

Then going forward i calculated Minimum Quantity, Maximum Quantity and Total Quantity by using aggregation and groupby function same for Sales also.

I extracted all the unique values across the dataset

I extracted all the duplicate values across the dataset

Correlation - Heatmap = All numeric variables

Regression plot = All numerical variables

Barplot - categorical variable vs numerical varibles

Then I Added the columns - Month, Day and Hour for the invoice

TOP 5 customers with higher number of orders

How much money spent by the customers

TOP 5 customers with highest money spent

How many orders per month?

How many orders per day?

How many orders per hour?

How many orders for each country?

Thank You !!!

Kishan Kanhaiya 4y

Good job, Abhishek. I would like to suggest couple of things for you to try, as it can help you to bring more insights during data exploration. During EDA do check the distribution of numerical features to check if it's normally distributed or skewed. And if it skewed or not following normal distribution what sort of transformation you can try. :)

Python for Data Science - Assignment 2

Abhishek Kad

More articles by Abhishek Kad

Others also viewed

How to Do Boolean Row Selection in Python + R

The Power of Visualization - Bar Plot in Python

Visualization in Python

Easy data reconciliation in Python

NumPy Library

Python & Data Structures : 20-Day Challenge

How to Filter Rows With Python + R (and Produce More Readable Code)

Pandas vs. NumPy

Treating and Removing Outliers in dataset using Python

Getting Started With Python’s NumPy

Explore content categories

More articles by Abhishek Kad

Capstone Project on Customer Subscription Prediction

Dashboard on Customer Subscription Prediction

Machine Learning - Assignment 2 - Unsupervised data

Machine Learning - Assignment 1

Python for Data Science - Assignment 1

Mathematics and Statistics for Data Science

SQL (Structured Query Language)

Computer Science for Business Professionals

Introduction to Computer Science