We just finished a Databricks data platform for a client. Unity Catalog, isolated data products, full governance stack. Then came a set of lightweight integration APIs. The obvious move was to run them on Databricks too. We ran the numbers first. 🔢 The cost difference was huge. 📉 We chose the cheaper option and deliberately stepped outside the central governance model. Here is why that was the right call, and what the trade-off actually looked like in practice. 🔧 🔗 [link in comments]
CloudBoostUP’s Post
More Relevant Posts
-
Is it worth migrating off Synapse or SQL Server to Databricks? Short answer: for many organizations at scale — yes. Especially if they can pull it off (really) fast and at a low risk. I'm joining @Databricks and my colleague @Mihaly Daranyi for a free webinar where we'll talk honestly about what it takes to move off SQL Server and Azure Synapse — and what you gain on the other side. 📅 Tuesday, May 5 | 2:00–3:00 PM CET | ZOOM This isn't just a technical decision — it's a strategic one. We'll cover: 🔹 The business case for moving to a modern data platform 🔹 Architecture and process decisions that make or break projects 🔹 How AI tooling is changing the speed and cost of migrations 👉 Register free: https://lnkd.in/dD_j-b4R
To view or add a comment, sign in
-
-
𝗖𝗜/𝗖𝗗 𝗶𝘀 𝗻𝗼𝘁 𝗷𝘂𝘀𝘁 𝗮𝗯𝗼𝘂𝘁 𝗱𝗲𝗽𝗹𝗼𝘆𝗶𝗻𝗴 𝗰𝗼𝗱𝗲. 𝗜𝘁’𝘀 𝗮𝗯𝗼𝘂𝘁 𝗯𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝗮 𝗿𝗲𝗽𝗿𝗼𝗱𝘂𝗰𝗶𝗯𝗹𝗲 𝗟𝗮𝗸𝗲𝗵𝗼𝘂𝘀𝗲. Entrada Data Engineer, William Guzman Daugherty just published a new article on how to implement true CI/CD in Databricks using Terraform and Databricks Asset Bundles. By separating infrastructure and workloads into two layers, teams can eliminate environment drift, improve governance with Unity Catalog, and deploy with confidence across environments. This is a practical breakdown of what production-grade CI/CD actually looks like on Databricks. Check out William’s full deep dive. Link in the comments 👇 #DatabricksPartner #DataEngineering #CloudInfrastructure #Lakehouse
To view or add a comment, sign in
-
-
Your data pipeline passed every check. Row counts matched. Schema intact. Jobs completed on schedule. And the CFO just presented corrupted numbers to the board. The most expensive data failures are the ones nobody catches. I built a three-chapter framework to fix that — validated with 143 automated tests across 3 open-source Databricks repositories: Chapter 1: Certify every record before downstream consumption (56 tests) Chapter 2: Block Gold refresh when business signal collapses (50 tests) Chapter 3: Prove what your controls actually catch — with ground truth (37 tests) Not a slide deck. Not a demo. Working code you can download and run today on Databricks Free Edition. Swipe through the deck below to see the full framework. #Databricks #DataEngineering #DataQuality #EnterpriseAI #MedallionArchitecture
To view or add a comment, sign in
-
I’ve been exploring zero-copy data sharing and interoperability between Snowflake and Databricks using object storage. Recently, I was able to successfully validate an end-to-end use case where streaming data from Kafka using Databricks SDP was written in Iceberg v3 format to ADLS and then accessed in Snowflake via an external catalog (CLD). This was true cross-platform interoperability without data duplication for the streaming usecase. Next, I’m exploring use cases around sharing this CLD-backed data in Snowflake using data sharing capabilities. Iam also keen to evaluate performance when querying externally managed Iceberg tables from Snowflake. Curious to hear from others who are experimenting with similar lakehouse interoperability patterns and open table formats.
To view or add a comment, sign in
-
Data Quality Monitoring in Databricks made easier Earlier, when we had to create alerts in Databricks, we wrote SQL commands in query editor and configure alert condition. But, it is easier now - You can create alerts directly from UI to detect anomalies and maintain data quality. This feature is still in beta. #databricks #dataengineer #dataquality #alerts
To view or add a comment, sign in
-
-
Now, Snowflake allows you to continue executing the rest of your DML statements while simply logging the errors, without stopping the pipeline. This feature is a game-changer for data engineers who are building fault-tolerant pipelines on Snowflake. For more details, check the documentation: https://lnkd.in/gNmHqfv8 #Snowflake #DataEngineering
To view or add a comment, sign in
-
-
5 Airflow DAGs, 6 source tables. All of it replaced with Snowflake-native pattern : Pipe - Stream - Task - MERGE. The Stream tracks net-new rows automatically, and it triggers only when there's data (SYSTEM$STREAM_HAS_DATA), so there is no credits burn on empty runs. When something breaks, one query is enough to see the history. Airflow still earns its place when coordinating across systems. But if the entire pipeline lives inside Snowflake, keeping the orchestrator there removes an entire failure surface. #Snowflake #SQL #DataEngineering #Optimization
To view or add a comment, sign in
-
-
Big shift happening in data engineering right now. Platforms like Snowflake and Microsoft Fabric just reached a major milestone: 👉 Zero-copy data sharing using open table formats like Apache Iceberg What does that actually mean? For the first time, teams can query the same data across different platforms… without moving it, copying it, or duplicating pipelines. That’s a big deal. For years, we’ve built systems around: • data movement • duplication • synchronization Now the model is shifting to: 👉 shared data + shared metadata layer This changes how we think about architecture: 🔹 Less ETL just to move data around 🔹 Fewer duplicated datasets 🔹 More focus on governance and consistency Feels like we’re moving from: “Where is the data?” to “How do we access the same source of truth?” And honestly… this aligns perfectly with the direction of tools like Apache Iceberg. Curious — do you see this reducing complexity in your environment, or introducing new challenges? #dataengineering #dataplatform #apacheiceberg #bigdata #architecture
To view or add a comment, sign in
-
𝗜𝗳 𝘆𝗼𝘂'𝗿𝗲 𝗯𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝗼𝘂𝘁 𝗱𝗮𝘁𝗮 𝗴𝗼𝘃𝗲𝗿𝗻𝗮𝗻𝗰𝗲 𝗼𝗿 𝗔𝗕𝗔𝗖 𝗶𝗻 𝗗𝗮𝘁𝗮𝗯𝗿𝗶𝗰𝗸𝘀 𝘁𝗵𝗶𝘀 𝗼𝗻𝗲'𝘀 𝗳𝗼𝗿 𝘆𝗼𝘂. Governed Tags in Databricks Unity Catalog are now Generally Available. This is the base layer for ABAC and data governance done right. No more guessing which tables are production vs dev. No more inconsistent labels across teams. Tag it once, enforce it everywhere. 𝗡𝗼𝘄 𝘆𝗼𝘂 𝗰𝗮𝗻 𝘁𝗮𝗴 𝘆𝗼𝘂𝗿 𝗲𝗻𝘃𝗶𝗿𝗼𝗻𝗺𝗲𝗻𝘁𝘀 𝗹𝗶𝗸𝗲: 𝘦𝘯𝘷: 𝘱𝘳𝘰𝘥𝘶𝘤𝘵𝘪𝘰𝘯 | 𝘦𝘯𝘷: 𝘴𝘵𝘢𝘨𝘪𝘯𝘨 | 𝘦𝘯𝘷: 𝘥𝘦𝘷 𝗧𝗮𝗴 𝘆𝗼𝘂𝗿 𝗱𝗮𝘁𝗮 𝘀𝗲𝗻𝘀𝗶𝘁𝗶𝘃𝗶𝘁𝘆: 𝘱𝘪𝘪: 𝘵𝘳𝘶𝘦 | 𝘤𝘭𝘢𝘴𝘴𝘪𝘧𝘪𝘤𝘢𝘵𝘪𝘰𝘯: 𝘳𝘦𝘴𝘵𝘳𝘪𝘤𝘵𝘦𝘥 𝗧𝗮𝗴 𝗼𝘄𝗻𝗲𝗿𝘀𝗵𝗶𝗽: 𝘵𝘦𝘢𝘮: 𝘧𝘪𝘯𝘢𝘯𝘤𝘦 | 𝘥𝘰𝘮𝘢𝘪𝘯: 𝘴𝘶𝘱𝘱𝘭𝘺-𝘤𝘩𝘢𝘪𝘯 Then write access policies against those tags not against hundreds of individual tables. One customer had 3 different spellings of "𝗽𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻" across their catalog. Governed tags eliminate that. Admins define the allowed tags, everyone else picks from the list. You can also Enable Auto Tagging and Databricks takes care of the rest. Stop governing data with spreadsheets and naming conventions. The primitive is here-use it. #DataGovernance #Databricks #UnityCatalog #ABAC #GovernedTags
To view or add a comment, sign in
-
https://www.cloudboostup.com/from-our-work/serverless-cost-optimization-replacing-databricks-with-flex-consumption