๐ A Data Analytics Reinforcement Project by Vishnu Raj
Employee attrition (or employee turnover) is a major concern for organizations today.
The goal of this project is to analyze employee data and understand the factors that influence attrition, helping HR teams to make smarter, data-driven retention decisions.
This analysis focuses on Exploratory Data Analysis (EDA) and Statistical Testing using a synthetic dataset, simulating real-world HR conditions.
โ
Identify which factors are strongly linked to employee attrition
โ
Explore demographic, job, and performance data for patterns
โ
Apply statistical tests to check significant relationships
โ
Visualize data clearly for better storytelling and HR insights
| Purpose | Library |
|---|---|
| Data Handling | pandas, numpy |
| Visualization | matplotlib, seaborn |
| Statistical Testing | scipy.stats |
| Development | Jupyter Notebook |
| Version Control | Git & GitHub |
๐ Dataset Name: Employee Attrition.csv
๐ Records: 59,598
๐งฉ Columns: 24 (Age, Income, Role, Satisfaction, etc.)
๐ง Target Variable: Attrition (Stayed / Left)
The dataset is synthetic, meaning itโs computer-generated to simulate real HR data.
This helps us analyze real-world-like scenarios without privacy concerns while keeping the logic and variability realistic.
โ Checked duplicates and verified no missing values
โ Converted categorical columns to numerical for correlation and analysis
โ Applied IQR method to detect and remove outliers from Monthly Income and Years at Company
โ Cleaned and structured data for EDA and statistical testing
Analyzed relationships between attrition and key variables such as:
- Gender
- Job Role
- Work-Life Balance
- Monthly Income
- Years at Company
๐งฉ Example:
Employees with poorer work-life balance and fewer promotions were more likely to leave, while income had minimal effect.
| Test | Comparison | Purpose | Result |
|---|---|---|---|
| Chi-Square Test | Gender vs Attrition | Check if gender influences attrition | โ Significant association |
| Independent t-test | Monthly Income vs Attrition | Check if salary differs between groups | โ No significant difference |
| Independent t-test | Promotions vs Attrition | Check if promotions affect attrition | โ Significant difference |
| Independent t-test | Distance vs Attrition | Check if commute distance affects attrition | โ Significant difference |
The t-test for promotions showed a significant difference (p < 0.05),
meaning employees who received fewer promotions were more likely to leave.
๐ก Employees who left the company were:
- Slightly younger (avg. 37.9 yrs vs 39.1 yrs)
- Had fewer promotions and longer commutes
- Had similar salaries to those who stayed
๐ Conclusion:
Attrition is not primarily salary-driven โ itโs more influenced by career growth, commute distance, and work-life balance.
Key Observations:
- Work-Life Balance has a moderate positive correlation with Attrition
- Job Level and Distance from Home show visible differences between groups
- Most other factors show weak correlation โ confirming attrition is multi-factorial
Outliers were identified using the Interquartile Range (IQR) formula:
Removed extreme values in:
- Monthly Income
- Years at Company
This improved the reliability of further analysis and visualizations.
This project highlights how data analysis and hypothesis testing can uncover meaningful trends behind employee behavior.
By studying career growth, work-life balance, and commute distance, HR teams can focus on improving these areas to reduce attrition rates effectively.
๐จโ๐ป Vishnu Raj
๐ Data Analytics Reinforcement Project
๐ผ GitHub | LinkedIn | ๐ง vishnuskillx@gmail.com
โญ If you found this project interesting, please give it a star! โญ







