If your data isn't structurally modelled, you will constantly run into… | Dylan Anderson

Bridging the gap between data and strategy ✦ Head of Data Strategy @ Profusion ✦ Author of The Data Ecosystem newsletter ✦ R Programmer ✦ Policy Nerd

1mo

If your data isn't structurally modelled, you will constantly run into engineering issues 🛑 And I see this in so many different organisations… Here’s a breakdown of four popular data modelling approaches, each with its unique strengths and considerations: 1. 𝐊𝐢𝐦𝐛𝐚𝐥𝐥 ⭐ Approach: Star schema design with a central fact table connected to multiple dimensional tables. Pros: Simple to understand and implement, efficient for querying, and works well with BI tools. Cons: Requires more storage due to denormalization, less flexible for complex relationships, and can be challenging to update and maintain. 2. 𝐈𝐧𝐦𝐨𝐧 🏢 Approach: Top-down, centralized, normalized data warehouse serving as the authoritative data source. Pros: Reduces redundancy, organized by subject area, flexible for changing business requirements. Cons: Complex, time-consuming, resource-intensive, requiring strong engineering support and higher maintenance. 3. 𝐃𝐚𝐭𝐚 𝐕𝐚𝐮𝐥𝐭 🔗 Approach: Modular design combining aspects of Inmon and Kimball, with a hub-and-spoke architecture. Pros: Highly scalable and flexible, excellent for metadata tracking, transparency, and auditability. Cons: Intricate structure, harder to understand and manage, challenging to set up and maintain. 4. 𝐎𝐧𝐞 𝐁𝐢𝐠 𝐓𝐚𝐛𝐥𝐞 (𝐎𝐁𝐓) 🗂️ Approach: Consolidates all data into a single, flat, denormalized table. Pros: Simplifies data retrieval, easy to implement, ideal for simple reporting needs. Cons: Significant data redundancy, storage inefficiencies, poor scalability, and performance issues for large datasets. Being flexible with how you use each one and implement it is crucial to success as well. Don’t just use star schema and call it a day, understand what is needed for your organization and how you may take multiple approaches to get the best result Check out my Data Modelling article from last summer, which goes into what you need to know about the domain from a high-level (subscription link in the comments). And honestly, I think understanding this area is probably one of the most useful things you can learn as a data professional

42 Comments

Dylan Anderson

Bridging the gap between data and strategy ✦ Head of Data Strategy @ Profusion ✦ Author of The Data Ecosystem newsletter ✦ R Programmer ✦ Policy Nerd

1mo

📌 check out the article on this from last summer - https://open.substack.com/pub/thedataecosystem/p/issue-14-the-forgotten-guiding-role?r=8frny&utm_medium=ios

5 Reactions

Anas Riad

Data Analyst & BI Consultant @Adway

1mo

Kimball is widely used in enterprise but lacks flexibility for sure

1 Reaction

Erfan H.

Data Engineer | Transitioned from Analytics to Engineering | | Co-founder and author of Pipeline2Insights (writing for 5K+)

1mo

Great article, I read it before couple of times, specially like the new layer you defined as business model ..

1 Reaction

Nik Walker

Human-Centric Data Engineering, Strategy and Analytics Leader, Talker of Mental Health & Neurodiversity, Titan in tweed, Loser of pens and endorser of all things witty.

1mo

You forgot the best one mate. FIWDIL modeling. "F*** It we'll do it live" - which seems to be the default since 2015.

13 Reactions

Emanuele Melis 💻

No-fluff Data & AI: I take ideas from zero to one. Fast.

1mo

Dylan Anderson that must be the best summary around!

1 Reaction

Jon Cooke

AI Digital Twins | Simulate business ideas in minutes with AI, Data Products and Data Object Graphs (DOGs) | Agent DOG Handler | Composable Enterprises with Data Product Pyramid | Data Product Workshop podcast co-host

1mo

Good summary Dylan Anderson - I'm guessing these are Analytics orientated right (OLAP etc.)? as there are whole classes of OLTP modelling approaches - https://blog.erwin.com/blog/data-modeling-101-oltp-data-modeling-design-and-normalization-for-the-cloud/

3 Reactions

Avalon D'Souza

Director of Analytics | CPA, CA | Applied Data Science

1mo

Would you agree there is a world where you can adopt a hybrid Kimball and inmon approach. The Kimball star schema models serve as a foundational and heavily governed layer. Inmon approach to having a datamart layer for quicker discovery/analysis that reads from the Kimball layer or source. It helps with speed to immediate value + foundation. Also, being intentional to move a datamart dataset to a Kimball approach once kinks have been ironed out

1 Reaction

Mikołaj Sędek

Data Architect @ Limango Polska | Architecting Analytics and ML Solutions - AWS / Databricks / Terraform / PySpark / SQL

1mo

If you don’t have a data model in the warehouse you actual have a model - one that is constantly breaking, failing on any upstream changes and forcing you to constant repairs and patching of failing ETLs 😵💫

2 Reactions

Sebastian Hewing 🚀

Building data & AI foundations for PE/VC-funded consumer companies | Ex-Rocket Internet | Solopreneur & Travel Addict 🌏

1mo

ah, I 🩷 data modeling. I'm usually running with a mix of kimball (core layer), OBT (marts layer) and - depending on the business model - activity schema. Works really well for scale-ups building their first data infra.

1 Reaction

Charlie Guo

Innovator | Salesforce CTA | Enterprise Architect | IT Leader

1mo

I love Data Vault for enterprise data warehousing for changing data, like customer data, which comes from many systems. It's not easy to implement but dbt has some packages to help. It's conceptually so elegant.

1 Reaction

See more comments

To view or add a comment, sign in

Dylan Anderson’s Post

More from this author

Robinhood’s Upcoming Brand Crisis

Explore topics