Welcome to the Data Warehouse and Analytics Project repository! π
This project showcases a complete end-to-end data engineering and analytics solution β from ingesting raw data to deriving powerful business insights.
Designed as a portfolio project, it reflects industry best practices in modern data warehousing, ETL pipelines, dimensional modeling, and business intelligence.
The architecture follows the Medallion Architecture with three data layers:
| Layer | Description |
|---|---|
| π₯ Bronze | Stores raw data as-ingested from ERP and CRM systems (CSV files into SQL Server). |
| π₯ Silver | Data is cleaned, standardized, and transformed for analytical readiness. |
| π₯ Gold | Business-ready, analytical data modeled in a star schema. Used for BI and reporting. |
This project includes:
β
Modern Data Architecture β Using Medallion model (Bronze/Silver/Gold)
β
ETL Pipelines β Custom SQL scripts to extract, clean, transform, and load data
β
Data Modeling β Fact and Dimension tables for reporting
β
SQL-based Analytics β Actionable business insights into sales, products, and customers
β
Dashboard & Reports β Tailored for stakeholders and business teams
πΌ Ideal For Roles In:
- Data Engineer
- SQL Developer
- Business Intelligence Analyst
- ETL Developer
- Data Architect
Everything is 100% Free & Open Source:
| Tool | Purpose |
|---|---|
| π Datasets | Raw ERP & CRM data in CSV format |
| π’οΈ SQL Server Express | Host your DW locally |
| π§° SSMS | GUI for managing SQL Server |
| π§© Draw.io | Diagrams for architecture, data flow, models |
Build a modern SQL Server-based Data Warehouse.
πΉ Import ERP & CRM CSV files
πΉ Clean and transform data
πΉ Build ETL pipelines using SQL
πΉ Create star schema models (fact + dimension tables)
β Focus on latest data only; no historization required.
Uncover key insights through advanced SQL queries:
- π Customer Behavior Analysis
- π¦ Product Performance Trends
- π° Sales & Revenue Insights
These insights drive data-informed decision-making.
π For requirements, check: docs/requirements.md
data-warehouse-project/
β
βββ datasets/ # Raw datasets used for the project (ERP and CRM data)
β
βββ docs/ # Project documentation and architecture details
β βββ etl.drawio # Draw.io file shows all different techniquies and methods of ETL
β βββ data_architecture.drawio # Draw.io file shows the project's architecture
β βββ data_catalog.md # Catalog of datasets, including field descriptions and metadata
β βββ data_flow.drawio # Draw.io file for the data flow diagram
β βββ data_models.drawio # Draw.io file for data models (star schema)
β βββ naming-conventions.md # Consistent naming guidelines for tables, columns, and files
β
βββ scripts/ # SQL scripts for ETL and transformations
β βββ bronze/ # Scripts for extracting and loading raw data
β βββ silver/ # Scripts for cleaning and transforming data
β βββ gold/ # Scripts for creating analytical models
β
βββ tests/ # Test scripts and quality files
β
βββ README.md # Project overview and instructions
βββ LICENSE # License information for the repository
βββ .gitignore # Files and directories to be ignored by Git
βββ requirements.txt # Dependencies and requirements for the project
This project is licensed under the MIT License.
You're free to use, modify, and share β just credit appropriately. π
π Hey there! I'm Pratik Mandalkar, a tech enthusiast passionate about solving real-world problems using Data Engineering, Analytics, and System Design.
πΌ Connect with me:
Letβs connect, collaborate, and learn together! π
π§ βData is the new oil, but insight is the spark that sets it on fire.β
