Tools are the fashion; Data Modeling is the skeleton. You can swap Airflow for Prefect, or Spark for DuckDB. But you can’t swap "bad logic" for a faster engine and expect it to work. In one project, I used Airflow. In another, Spark. Lately, it’s all dbt. But 100% of the time, the win came down to Data Modeling fundamentals. Building a data platform without modeling is like building a skyscraper on a swamp. It doesn't matter how expensive your gold-plated elevators (tools) are if the foundation is sinking. Here's what actually matters: 𝗗𝗶𝗺𝗲𝗻𝘀𝗶𝗼𝗻𝗮𝗹 𝗠𝗼𝗱𝗲𝗹𝗶𝗻𝗴 = 𝗦𝗽𝗲𝗲𝗱 Star schemas make queries fast. Facts and dimensions separated = happy analysts. 𝗦𝗖𝗗𝘀 𝗪𝗶𝗹𝗹 𝗕𝗶𝘁𝗲 𝗬𝗼𝘂 Skip SCD Type 2 tracking? Debug why historical reports show wrong data at 2 AM. 𝗡𝗼𝗿𝗺𝗮𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻 𝗜𝘀𝗻'𝘁 𝗥𝗲𝗹𝗶𝗴𝗶𝗼𝗻 OLTP systems? Normalize for integrity. OLAP systems? Denormalize for speed. Know your world. Design accordingly. 𝗗𝗮𝘁𝗮 𝗩𝗮𝘂𝗹𝘁 = 𝗙𝗹𝗲𝘅𝗶𝗯𝗶𝗹𝗶𝘁𝘆 Business requirements changing weekly? Data Vault keeps you sane. Verbose but bulletproof. 👉 Here are the real Non-negotiables: • Model for how data will be queried, not just stored • Document your grain—ambiguity kills data trust • Surrogate keys > natural keys (trust me on this) • Test your model with real queries before building pipelines My 2 cents: Master data modeling, and every tool becomes easier. Skip it, and you'll spend your career firefighting broken pipelines. Are you willing to upskill❓Explore these resources: → Michael K.'s KahanDataSolutions - https://lnkd.in/g4JSFPph → Benjamin Rogojan's Seattle Data Guy - https://lnkd.in/ghewnvBX → The Data Warehouse Toolkit by Ralph Kimball - https://lnkd.in/dTynC6yD Image Credits: Shubham Srivastava Every pipeline you build will eventually be replaced. A solid data model? That becomes the language of the company. What's one data modeling mistake that cost you hours of debugging? Let's learn together. 👇
Data Modeling Approaches
Explore top LinkedIn content from expert professionals.
Summary
Data modeling approaches are methods used to structure and organize data so it aligns with business needs, making information easier to understand, access, and use. These approaches help translate real-world processes into frameworks that support analytics, reporting, and operational decision-making.
- Match modeling to needs: Select the right data modeling style based on your goals, such as using normalized models for data integrity or dimensional models for fast analytics.
- Document business context: Map out how data captures key business events and relationships so everyone understands its purpose and usage.
- Test before building: Run real-world queries or simulations on your data model to check its accuracy and usefulness before deploying it in production systems.
-
-
As we close out the year (and on the back of my 'Best of 2024' article yesterday), I'll post my top Data Ecosystem infographics and posts for the next few weeks. And let's start with Data Modelling! Data success in business does not start with an AI product, a well-constructed pipeline, or a frequently used dashboard. Success starts with the business model. This, then helps inform the org data model. Your 𝐝𝐚𝐭𝐚 𝐦𝐨𝐝𝐞𝐥 𝐰𝐢𝐥𝐥 𝐭𝐮𝐫𝐧 𝐭𝐡𝐞 𝐨𝐫𝐠𝐚𝐧𝐢𝐬𝐚𝐭𝐢𝐨𝐧’𝐬 𝐩𝐫𝐨𝐜𝐞𝐬���𝐞𝐬 𝐢𝐧𝐭𝐨 𝐭𝐚𝐧𝐠𝐢𝐛𝐥𝐞 𝐝𝐚𝐭𝐚 𝐟𝐫𝐚𝐦𝐞𝐰𝐨𝐫𝐤𝐬 𝐚𝐧𝐝 𝐬𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞𝐬 to drive business results. Unfortunately most organisations skip this step and get into a mess of data engineering quick fixes Here’s a breakdown of the process from business model to each part of the data model: 1. Start with the 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐌𝐨𝐝𝐞𝐥 🏢 Map out how your organisation creates, delivers, and captures value. For example, company creates product x, ships to store y, sells to customer z and makes money. Within that, what are the data points to track and understand? You map this out, you understand the process of how the organization makes money. 2. Extending it to the 𝐂𝐨𝐧𝐜𝐞𝐩𝐭𝐮𝐚𝐥 𝐃𝐚𝐭𝐚 𝐌𝐨𝐝𝐞𝐥 🗺️ The conceptual data model then builds on top of this, rewriting how the business operates using a high-level representation of organisational data. This phase involves identifying key data entities & domains and mapping the relationships between them. This should be understandable by business stakeholders as this is the bridge between business processes and data development. 3. Adding structure with the 𝐋𝐨𝐠𝐢𝐜𝐚𝐥 𝐃𝐚𝐭𝐚 𝐌𝐨𝐝𝐞𝐥 🏙️ Next, we refine into a logical data model. Here, we delve deeper to define the data types, attributes of each entity and the nature of the relationships. This model is still independent of technical implementation but sets out a clear structure (often through ERDs or UML) for how data is related and organised. 4. Building the 𝐏𝐡𝐲𝐬𝐢𝐜𝐚𝐥 𝐃𝐚𝐭𝐚 𝐌𝐨𝐝𝐞𝐥 foundations 🏗️ Finally, we arrive at the physical data model. It’s the actual database schema design, complete with tables, columns, data types, and constraints. It also takes into account the performance requirements, optimization techniques, and the physical storage of the data. Obviously this is a broad simplification of the process. Each step of this journey requires collaboration between business leaders, data architects, and engineers to ensure the data models align with the strategic goals, is technically feasible and is understood by all relevant stakeholders. Building this properly helps everybody understand how data actually delivers value. Too often we engineer without the architecture to structure it. Fix this and you will be way better off in the long-term. #DataModeling #DataEngineering #BusinessModel #DataArchitecture #DataStrategy #DylanDecodes
-
Difference between Data Object Graphs and Knowledge Graphs Lot's of people have asked me how Data Object Graphs (DOGs) differ from traditional Knowledge Graphs (KGs). So I thought I share my perspectives: The Key Difference (TL;DR): While Knowledge Graphs excel at representing what things ARE, Data Object Graphs excel at modelling (and executing) what things DO. Specifically, Data Object Graphs (DOGs): - Model direct behaviour and are direct business process implementations - DOGs map 1:1 to actual business operations, enabling rapid translation from modelling to execution - Represent and are the exact business process come to life (e.g., Customer → appliesForLoan → RiskDept → scoresLoan → ReportCreator →createsReport) - Include specific executable steps that are implemented by a specific executable data product containers (e.g. Metrics, ML models, decisions etc..) with interactions directly mapped to business actions (specific API / graph calls). - Have agility - Business changes can be quickly implemented without complex intermediate abstractions - Provide immediate operational value through executable modelling - Faithfully model the real-world - ie what actually happens in your business, not theoretical abstractions - Enables simulation of business ideas with real data and execution capabilities Knowledge Graphs (KGs): - Tend to focus on formal semantic relationships between conceptual entities - Emphasize standardized ontologies and taxonomies - Excel at representing knowledge relationships beyond operational contexts - Provide semantic reasoning capabilities through formalized structures - Designed primarily for knowledge representation rather than process execution The DOG approach allows organizations to model during business process development rather than at an abstract data level, solving brittleness problems while maintaining enterprise-wide connectivity. The DOGs can go from business process modelling directly to execution with remarkable speed and agility. This allows organizations to adapt quickly to changing business requirements without sacrificing enterprise-wide visibility. These are two very valuable approaches, but have different objectives and goals but can be complementary #DataArchitecture #KnowledgeGraphs #DataObjectGraphs #BusinessProcessModeling #EnterpriseAgility #DataProducts
-
Want to know the difference between a junior and senior analytics engineer? It's not just SQL skills—it's mastering the art of data modeling. Most people think data modeling is just "writing SQL transformations". There are more design considerations that go into it. ✅ Facts vs Dimensions - Understanding that facts capture business events while dimensions provide context ✅ Star Schemas - Building central fact tables surrounded by dimension tables to minimize joins and maximize query performance ✅ Slowly Changing Dimensions - Knowing when to overwrite (Type 1) vs. when to preserve history (Type 2) ✅ The Normalization Paradox - Keep your source data clean and normalized (no redundancy), then strategically denormalize for analytics to reduce downstream joins and boost query performance. The reality? Every senior analytics engineer I know didn't just learn these concepts—they practiced them repeatedly until they became second nature. ➡️ What's the most challenging data modeling decision you've faced recently? Drop it in the comments—let's learn from each other's experiences.
-
Data modeling is one of those concepts that many data engineers don’t formally learn through coursework but often master through experience. Here my top 5 tips 1. Understand the Business Context Before you start modeling, deeply understand the business requirements and analytical needs. Engage with stakeholders to identify: • Key performance indicators (KPIs) • Critical dimensions and facts • Required data granularity • Expected query patterns Without this context, even the most technically sound model may fail to deliver value. 2. Follow Dimensional Modeling Principles For analytical workloads, adopting Kimball's dimensional modeling techniques is often the best approach. Key concepts include: • Fact Tables: Store measurable business events with numeric values (e.g., sales, transactions). • Dimension Tables: Store descriptive attributes (e.g., customer details, product categories). • Star Schema: Optimized for performance with fewer joins and simpler queries. • Snowflake Schema: Normalized dimensions to reduce data redundancy but requires more joins. 3. Prioritize Data Granularity Choosing the right grain is critical. Ask: What's the most detailed level of data you'll need for reporting? Will data be aggregated or filtered frequently? The more granular the better. A clear understanding of granularity ensures your model is efficient and avoids overcomplication. 4. Implement Surrogate Keys Avoid relying on natural keys directly in your model. Instead, use surrogate keys as primary keys in dimension tables. This enhances performance, simplifies joins, and protects against changes in natural keys. 5. Ensure Data Quality with Metadata Fields Add essential metadata fields to your tables: • created_date and last_modified_date for tracking data freshness • source_system to identify data origins • ETL_processed_date for tracking pipeline execution These fields simplify debugging, lineage tracking, and auditability.
-
𝗗𝗮𝘁𝗮 𝗠𝗼𝗱𝗲𝗹𝗹𝗶𝗻𝗴 𝗖𝗮𝗻 𝗠𝗮𝗸𝗲 𝗼𝗿 𝗕𝗿𝗲𝗮𝗸 𝗬𝗼𝘂𝗿 𝗦𝘆𝘀𝘁𝗲𝗺—Are You Using the Right One? How you model data 𝗱𝗲𝗳𝗶𝗻𝗲𝘀 your system’s performance, scalability, and analytics power. Choose the wrong model, and you’ll fight inefficiencies. With the right one, you build a 𝗵𝗶𝗴𝗵-𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲, 𝗳𝘂𝘁𝘂𝗿𝗲-𝗽𝗿𝗼𝗼𝗳 architecture. Here are the 𝘁𝗼𝗽 𝗱𝗮𝘁𝗮 𝗺𝗼𝗱𝗲𝗹𝗹𝗶𝗻𝗴 𝘁𝗲𝗰𝗵𝗻𝗶𝗾𝘂𝗲𝘀 you need to know: ⭘ 𝗛𝗶𝗲𝗿𝗮𝗿𝗰𝗵𝗶𝗰𝗮𝗹 𝗠𝗼𝗱𝗲𝗹 – Tree-structured, parent-child relationships. Great for rigid and well-defined relationships. ⭘ 𝗡𝗲𝘁𝘄𝗼𝗿𝗸 𝗠𝗼𝗱𝗲𝗹 – Graph-like structure with multiple parent-child connections. Best for complex relationships. ⭘ 𝗘𝗻𝘁𝗶𝘁𝘆-𝗥𝗲𝗹𝗮𝘁𝗶𝗼𝗻𝘀𝗵𝗶𝗽 𝗠𝗼𝗱𝗲𝗹 – Classic database design, defining entities, attributes, and relationships. ⭘ 𝗥𝗲𝗹𝗮𝘁𝗶𝗼𝗻𝗮𝗹 𝗠𝗼𝗱𝗲𝗹 – Tables, columns, and foreign keys. The gold standard for structured databases. ⭘ 𝗗𝗶𝗺𝗲𝗻𝘀𝗶𝗼𝗻𝗮𝗹 𝗠𝗼𝗱𝗲𝗹 – Fact and dimension tables optimised for 𝗮𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀 and 𝗱𝗮𝘁𝗮 𝘄𝗮𝗿𝗲𝗵𝗼𝘂𝘀𝗶𝗻𝗴. ⭘ 𝗢𝗯𝗷𝗲𝗰𝘁-𝗢𝗿𝗶𝗲𝗻𝘁𝗲𝗱 𝗠𝗼𝗱𝗲𝗹 – Data as objects with attributes and behaviours. Best for OOP-based applications. ⭘ 𝗗𝗮𝘁𝗮 𝗩𝗮𝘂𝗹𝘁 𝗠𝗼𝗱𝗲𝗹 – Hubs, links, and satellites for scalable and 𝗮𝘂𝗱𝗶𝘁𝗮𝗯𝗹𝗲 data warehousing. ⭘ 𝗚𝗿𝗮𝗽𝗵 𝗠𝗼𝗱𝗲𝗹 – Nodes and edges for 𝗰𝗼𝗺𝗽𝗹𝗲𝘅 𝗿𝗲𝗹𝗮𝘁𝗶𝗼𝗻𝘀𝗵𝗶𝗽𝘀. Perfect for social networks and fraud detection. 𝗬𝗼𝘂𝗿 𝗰𝗵𝗼𝗶𝗰𝗲 𝗼𝗳 𝗱𝗮𝘁𝗮 𝗺𝗼𝗱𝗲𝗹 𝗶𝗺𝗽𝗮𝗰𝘁𝘀 𝗲𝘃𝗲𝗿𝘆𝘁𝗵𝗶𝗻𝗴—𝗳𝗿𝗼𝗺 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝘁𝗼 𝗺𝗮𝗶𝗻𝘁𝗮𝗶𝗻𝗮𝗯𝗶𝗹𝗶𝘁𝘆. Which one do you use the most?
-
Data Modeling Types! Understanding the different types of data modeling is crucial for any data professional. Here's a rundown of the essential data modeling types that can transform your data management strategies - ✅ Conceptual Data Modeling - It represents the main concepts and relationships within an organization's data. Think of it as a high-level overview, free from technical constraints ✅ Logical Data Modeling - This is where things get more detailed. It defines the structure of data and the relationships between data entities, regardless of the technology used ✅ Physical Data Modeling - It gets into the nitty-gritty of how data will be stored in a specific database management system with all the necessary specifications ✅ Entity-Relationship Modeling - It's a graphical approach that's indispensable in database design ✅ Relational Modeling: The mathematical underpinning of data in relational databases. It’s all about efficient organization and retrieval ✅ Object-Oriented Modeling - Here, data is modeled akin to real-world objects, capturing both data and behavior in a way that aligns with object-oriented programming paradigms ✅ Star Schema Modeling - This model is designed for optimal data retrieval and analysis ✅ Dimensional Modeling - The backbone of data warehousing that organizes data into facts and dimensions, facilitating complex queries and analyses Each of these models serves different purposes and can be applied across various stages of the data design process. By mastering these, you can ensure your data is structured, stored, and utilized to its full potential.
-
Most people use databases daily… but very few actually understand the data models behind them. And choosing the wrong model can break analytics, slow performance, or make scaling impossible. Here is a crisp breakdown of the 7 most important data models every analyst, engineer, and PM should know - 1. OLTP (Online Transaction Processing) Best for real-time operations like banking, ecommerce, and order systems. Fast inserts + strong accuracy. 2. OLAP (Online Analytical Processing) Built for analytics, trend reports, dashboards, and large historical datasets. 3. Star Schema One fact table + multiple dimension tables. Simple, fast, and perfect for BI tools. 4. Snowflake Schema A more normalized version of star schema. Reduces redundancy and improves storage efficiency. 5. Wide Table Model Single extremely wide table with hundreds of attributes. Great for ML features and fast reads, no joins needed. 6. Normalized Model Data split into clean, structured tables with relationships. Ensures accuracy, consistency, and minimal duplication. 7. Denormalized Model Opposite of normalization - fewer tables, faster reads. Ideal for analytics and search systems. There’s no “best” data model, only the best for your use case. Choose based on performance, analytics needs, and operational complexity.
-
Exploring Data Modeling Types Data modeling is a crucial step in structuring and organizing data within any organization. It serves as a blueprint for building databases and ensures data integrity and efficiency. Let's dive into the various types of data modeling, each serving a unique purpose in the data management lifecycle. 1️⃣ Conceptual Data Modeling A high-level model representing the main concepts and relationships within an organization's data. It's independent of any specific technology or database. 2️⃣ Logical Data Modeling This detailed model defines the structure of data and the relationships between data entities. Like conceptual models, logical models are independent of specific technologies. 3️⃣ Physical Data Modeling Physical data modeling specifies how data will be stored in a particular database management system. It includes details such as table and column names, data types, and indexes. 4️⃣ Entity-Relationship Modeling A graphical representation of entities and their relationships. It's widely used in database design to visualize data structure. 5️⃣ Relational Modeling This type involves a mathematical representation of data and is commonly used in relational databases, emphasizing the relationships between data entities. 6️⃣ Object-Oriented Modeling A graphical representation focusing on entities and their relationships, particularly useful in database design for object-oriented programming environments. 7️⃣ Star Schema Modeling Used in business intelligence and data warehousing, the star schema organizes data into fact and dimension tables, optimizing queries and data retrieval. 8️⃣ Dimensional Modeling Similar to star schema modeling, dimensional modeling is used in business intelligence and data warehousing. It structures data into facts and dimensions for effective data analysis. Understanding these data modeling types helps in choosing the right approach for organizing data, ensuring efficient data management, and facilitating decision-making processes. Whether you're setting up a simple database or designing a complex data warehouse, these models provide the foundation for robust data architecture. #datascience #dbms #sql #ai #datamodeling #rdbms