Skip to content

Pratik3c/sql-data-warehouse-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

πŸ“Š Data Warehouse and Analytics Project

Welcome to the Data Warehouse and Analytics Project repository! πŸš€
This project showcases a complete end-to-end data engineering and analytics solution β€” from ingesting raw data to deriving powerful business insights.

Designed as a portfolio project, it reflects industry best practices in modern data warehousing, ETL pipelines, dimensional modeling, and business intelligence.


πŸ—οΈ Data Architecture: Medallion Framework

The architecture follows the Medallion Architecture with three data layers:

Data Architecture

Layer Description
πŸ₯‰ Bronze Stores raw data as-ingested from ERP and CRM systems (CSV files into SQL Server).
πŸ₯ˆ Silver Data is cleaned, standardized, and transformed for analytical readiness.
πŸ₯‡ Gold Business-ready, analytical data modeled in a star schema. Used for BI and reporting.

πŸ“– Project Highlights

This project includes:

βœ… Modern Data Architecture β€” Using Medallion model (Bronze/Silver/Gold)
βœ… ETL Pipelines β€” Custom SQL scripts to extract, clean, transform, and load data
βœ… Data Modeling β€” Fact and Dimension tables for reporting
βœ… SQL-based Analytics β€” Actionable business insights into sales, products, and customers
βœ… Dashboard & Reports β€” Tailored for stakeholders and business teams

πŸ’Ό Ideal For Roles In:

  • Data Engineer
  • SQL Developer
  • Business Intelligence Analyst
  • ETL Developer
  • Data Architect

πŸ› οΈ Tools & Resources

Everything is 100% Free & Open Source:

Tool Purpose
πŸ“‚ Datasets Raw ERP & CRM data in CSV format
πŸ›’οΈ SQL Server Express Host your DW locally
🧰 SSMS GUI for managing SQL Server
🧩 Draw.io Diagrams for architecture, data flow, models

πŸš€ Project Objectives

πŸ› οΈ 1. Data Engineering Phase

Build a modern SQL Server-based Data Warehouse.

πŸ”Ή Import ERP & CRM CSV files
πŸ”Ή Clean and transform data
πŸ”Ή Build ETL pipelines using SQL
πŸ”Ή Create star schema models (fact + dimension tables)

βœ… Focus on latest data only; no historization required.


πŸ“Š 2. Business Intelligence & Reporting

Uncover key insights through advanced SQL queries:

  • πŸ” Customer Behavior Analysis
  • πŸ“¦ Product Performance Trends
  • πŸ’° Sales & Revenue Insights

These insights drive data-informed decision-making.

πŸ“„ For requirements, check: docs/requirements.md


πŸ“‚ Repository Structure

data-warehouse-project/
β”‚
β”œβ”€β”€ datasets/                           # Raw datasets used for the project (ERP and CRM data)
β”‚
β”œβ”€β”€ docs/                               # Project documentation and architecture details
β”‚   β”œβ”€β”€ etl.drawio                      # Draw.io file shows all different techniquies and methods of ETL
β”‚   β”œβ”€β”€ data_architecture.drawio        # Draw.io file shows the project's architecture
β”‚   β”œβ”€β”€ data_catalog.md                 # Catalog of datasets, including field descriptions and metadata
β”‚   β”œβ”€β”€ data_flow.drawio                # Draw.io file for the data flow diagram
β”‚   β”œβ”€β”€ data_models.drawio              # Draw.io file for data models (star schema)
β”‚   β”œβ”€β”€ naming-conventions.md           # Consistent naming guidelines for tables, columns, and files
β”‚
β”œβ”€β”€ scripts/                            # SQL scripts for ETL and transformations
β”‚   β”œβ”€β”€ bronze/                         # Scripts for extracting and loading raw data
β”‚   β”œβ”€β”€ silver/                         # Scripts for cleaning and transforming data
β”‚   β”œβ”€β”€ gold/                           # Scripts for creating analytical models
β”‚
β”œβ”€β”€ tests/                              # Test scripts and quality files
β”‚
β”œβ”€β”€ README.md                           # Project overview and instructions
β”œβ”€β”€ LICENSE                             # License information for the repository
β”œβ”€β”€ .gitignore                          # Files and directories to be ignored by Git
└── requirements.txt                    # Dependencies and requirements for the project

πŸ” License

This project is licensed under the MIT License.
You're free to use, modify, and share β€” just credit appropriately. πŸ™Œ


🌟 About Me

πŸ‘‹ Hey there! I'm Pratik Mandalkar, a tech enthusiast passionate about solving real-world problems using Data Engineering, Analytics, and System Design.

πŸ’Ό Connect with me:

Let’s connect, collaborate, and learn together! πŸš€


🧠 β€œData is the new oil, but insight is the spark that sets it on fire.”

Releases

No releases published

Packages

No packages published

Languages