Welcome to the Data Warehouse and Analytics Project repository! π
This project demonstrates a comprehensive data warehousing and analytics solution, from building a data warehouse to generating actionable insights. Designed as a portfolio project, it highlights industry best practices in data engineering and analytics.
The data architecture for this project follows Medallion Architecture Bronze, Silver, and Gold layers:

- Bronze Layer: Stores raw data as-is from the source systems. Data is ingested from CSV Files into SQL Server Database.
- Silver Layer: This layer includes data cleansing, standardization, and normalization processes to prepare data for analysis.
- Gold Layer: Houses business-ready data modeled into a star schema required for reporting and analytics.
This project involves:
- Data Architecture: Designing a Modern Data Warehouse Using Medallion Architecture Bronze, Silver, and Gold layers.
- ETL Pipelines: Extracting, transforming, and loading data from source systems into the warehouse.
- Data Modeling: Developing fact and dimension tables optimized for analytical queries.
π― This repository is an excellent resource for showcaseing expertise in:
- SQL Development
- Data Architect
- Data Engineering
- ETL Pipeline Developer
- Data Modeling
- Data Analytics
All the required links and tools are as follows:
- Datasets: Access to the project dataset (csv files).
- SQL Server Express: Lightweight server for hosting your SQL database.
- SQL Server Management Studio (SSMS): GUI for managing and interacting with databases.
- Git Repository: Set up a GitHub account and repository to manage, version, and collaborate on your code efficiently.
- DrawIO: Design data architecture, models, flows, and diagrams.
Develop a modern data warehouse using SQL Server to consolidate sales data, enabling analytical reporting and informed decision-making.
- Data Sources: Import data from two source systems (ERP and CRM) provided as CSV files.
- Data Quality: Cleanse and resolve data quality issues prior to analysis.
- Integration: Combine both sources into a single, user-friendly data model designed for analytical queries.
- Scope: Focus on the latest dataset only; historization of data is not required.
- Documentation: Provide clear documentation of the data model to support both business stakeholders and analytics teams.
data-warehouse-project/
β
βββ datasets/ # Raw datasets used for the project (ERP and CRM data)
β
βββ docs/ # Project documentation and architecture details
β βββ etl.drawio # Draw.io file shows all different techniquies and methods of ETL
β βββ data_architecture.drawio # Draw.io file shows the project's architecture
β βββ data_catalog.md # Catalog of datasets, including field descriptions and metadata
β βββ data_flow.drawio # Draw.io file for the data flow diagram
β βββ data_models.drawio # Draw.io file for data models (star schema)
β βββ naming-conventions.md # Consistent naming guidelines for tables, columns, and files
β
βββ scripts/ # SQL scripts for ETL and transformations
β βββ bronze/ # Scripts for extracting and loading raw data
β βββ silver/ # Scripts for cleaning and transforming data
β βββ gold/ # Scripts for creating analytical models
β
βββ tests/ # Test scripts and quality files
β
βββ README.md # Project overview and instructions
βββ LICENSE # License information for the repository
βββ .gitignore # Files and directories to be ignored by Git
βββ requirements.txt # Dependencies and requirements for the project
Hi! Iβm a Data Analyst & aspiring Data Engineer who loves turning raw data into meaningful insights.
In this project, I designed a modern data warehouse using the Medallion Architecture (Bronze, Silver, Gold layers), built ETL pipelines for data integration, and developed fact and dimension models optimized for analytics.
Iβm passionate about creating efficient data systems that make decision-making easier and smarter. π
This project is licensed under the MIT License. You are free to use, modify, and share this project with proper attribution.