We cut cloud costs by 80%!! 💸📉 The surprising part? The biggest saving wasn't from better code. 🙅♀️ It was from reading the bill properly. 📑🔍 We found: → Clusters running all night with no jobs on them. 🌙💤 → Data stored in CSV (on a distributed system, in 2022!). 📁🚫 → Jobs running one by one that could run in parallel. 🐢➡️🐎 Migrated to Databricks . Converted to Delta Lake. Fixed the cluster config. 🛠️✨ The Result: ✅ 80% cost reduction. ✅ 85% faster processing. Sometimes the best engineering is just paying attention. 🧠💡 What's the most surprising inefficiency you've found in a data platform? 👇 #Databricks #DataEngineering #CloudCost #BigData #ApacheSpark #DeltaLake #CloudOptimization #TorontoTech #TorontoDataEngineer
Utkarsha Borikar’s Post
More Relevant Posts
-
Most enterprise data platform builds start with the vendor question. AWS or Azure? Snowflake or Databricks? The cloud and the engine are variables. The architecture is the constant. Bronze. Silver. Gold. Source data preserved as-is. Cleaned and conformed across systems. Curated and ready for serving. Runs on any cloud, with any engine, swappable layer by layer. 𝗪𝗼𝗿𝗸𝘀𝗵𝗼𝗽. Inventory the top 3 use cases. Pick one. 𝗣𝗿𝗼𝘃𝗲. Land Bronze. Promote one Gold use case end to end. 𝗦𝗰𝗮𝗹𝗲. Decide the compute engine when scope is real, not before. 90 days to first Gold. Not 12 months to a half-finished warehouse. The architecture stays the same. The platform choices are the variables. None of these decisions are one-way doors. 👉 If your team is six months into a data platform build with no shipped use case, the constraint is usually less about budget and more about sequence. 🔗 More on how we approach it → https://lnkd.in/eX7m8jHP #EnterpriseData #DataEngineering #DataFoundation #SmartData
To view or add a comment, sign in
-
-
Most data platform conversations start in the wrong place. This is a better way to sequence it. #Snowflake #Databricks #enterprisedataplatforms #SmartData
Most enterprise data platform builds start with the vendor question. AWS or Azure? Snowflake or Databricks? The cloud and the engine are variables. The architecture is the constant. Bronze. Silver. Gold. Source data preserved as-is. Cleaned and conformed across systems. Curated and ready for serving. Runs on any cloud, with any engine, swappable layer by layer. 𝗪𝗼𝗿𝗸𝘀𝗵𝗼𝗽. Inventory the top 3 use cases. Pick one. 𝗣𝗿𝗼𝘃𝗲. Land Bronze. Promote one Gold use case end to end. 𝗦𝗰𝗮𝗹𝗲. Decide the compute engine when scope is real, not before. 90 days to first Gold. Not 12 months to a half-finished warehouse. The architecture stays the same. The platform choices are the variables. None of these decisions are one-way doors. 👉 If your team is six months into a data platform build with no shipped use case, the constraint is usually less about budget and more about sequence. 🔗 More on how we approach it → https://lnkd.in/eX7m8jHP #EnterpriseData #DataEngineering #DataFoundation #SmartData
To view or add a comment, sign in
-
-
Most Databricks teams can see their costs. Explaining them is a different problem entirely. Visibility is having the numbers — DBU reports, cloud invoices, usage dashboards. Most teams have all of that. Explainability is being able to answer the questions those numbers raise: → Which job drove last month's spike? → Which team owns the spend? → Did that cluster optimisation actually reduce total cost — or just shift it from the Databricks bill to the cloud bill? Neither billing system answers those questions on its own. And bridging them takes more than a notebook. Our latest article breaks down exactly why — the structural reason the two systems can't reconcile themselves, and why having the data is not the same as having the answer. Full article linked in the comments → #Databricks #FinOps #DataEngineering #CostObservability #CloudCost #DataPlatform #lumin8
To view or add a comment, sign in
-
-
Here's a useful #Dataform concept: pre_operations and post_operations. As the name implies, these represent a set of actions that run before and after the main operation (table, view, or #SQL operations). In practice it enables you to cleanly: ➡️ Declare and set a variable ➡️ Clean up before inserting into a table ➡️ Compute a watermark for your incremental model ➡️ Log run metadata after execution ➡️ Perform a maintenance task What's the most interesting use case you've seen for pre_operations / post_operations (or pre_hook / post_hook if you're on #dbt)? P.S. I'll be presenting a quick practical walkthrough of Dataform later today at #BigQueryDay, hosted by Eon.io and Google Cloud — link to free registration in the comments.
To view or add a comment, sign in
-
-
What if you could ask your cloud bill a question and get an answer in one second? This is a live demo — not a mockup. I'm querying a multi-cloud FOCUS 1.2 dataset (AWS, Azure, Oracle) using an MCP server connected to AWS Athena. No dashboard to build. No SQL to write. Just a question and an answer. In 90 seconds you'll see: • Total spend across three clouds • Effective savings rate (spoiler: it's bad) • Most expensive resource identified instantly • Zombie resources burning budget with zero value • A maturity assessment based on the data This is what FinOps looks like when you connect AI to your cost data with proper governance. Read-only credentials. Full audit trail. No new attack surface. IT keeps control. Users get answers. The dashboards aren't going away — they're still the executive view. This fills the gap for every question the dashboard doesn't answer. If your team is spending $20K+/month across clouds and your savings rate is under 15%, let's talk. #FinOps #CloudCost #MCP #AWS
To view or add a comment, sign in
-
We’ve started putting together a tiny index of agentic data tools. Skills, MCPs, and CLIs organized across the data stack: Microsoft Fabric, Databricks, Snowflake, dbt, duckdb, and more. Cloud providers too. 👉 GitHub: https://lnkd.in/etHizZPi #datasystems #aisystems
To view or add a comment, sign in
-
-
Most cluster sprawl isn't a cluster problem. It's a conversation that never happened. Walked into a company where every consultant on the data team was spinning up their own Databricks cluster. No shared standards. No capacity plan. Just panic when the bill arrived. The fix wasn't more tooling. It was sitting down with the data platform lead and the cloud architect to model the landing zone as a time-series problem rather than a budget exercise. Full story in the comments. #machinelearning #datascience #databricks #mlops #azure #enterprisearchitect
To view or add a comment, sign in
-
Just read about the new Amazon Redshift RG instances powered by AWS Graviton. I think it is an interesting move by AWS with the new Amazon Redshift RG instances powered by Graviton. The biggest win may not just be performance or cost improvements, but the simpler architecture: 👉 querying data warehouse + data lake workloads using a more integrated engine. Less operational overhead, better Iceberg support, and a cleaner analytics stack is always a good direction. #AWS #Redshift #DataEngineering #Cloud #Analytics
To view or add a comment, sign in
-
SQL Analytics has always been one of my favourite SquaredUp features - it enables you to combine data from multiple sources and run queries over them as if they were SQL tables. SQL Analytics now ships with our SmartAssist AI assistant - so you don't even need to know any SQL to build queries - just describe the outcomes you need: 👉 "Show me my combined cloud spend across AWS and Azure." 👉 "Now format that to two decimal places." 👉 "Break it down by week and show it as a column chart." I have written a walkthrough in this blog article: https://lnkd.in/ettFY_Jq
To view or add a comment, sign in
-
-
"Started exploring Snowflake today — a cloud data warehouse that's widely used in modern data engineering! Key thing I learned — unlike traditional databases, Snowflake separates storage and compute. This means teams can query data independently without slowing each other down! #DataEngineering #Snowflake #LearningInPublic #CloudData"
To view or add a comment, sign in