Utkarsha Borikar’s Post

3w Edited

We cut cloud costs by 80%!! 💸📉 The surprising part? The biggest saving wasn't from better code. 🙅♀️ It was from reading the bill properly. 📑🔍 We found: → Clusters running all night with no jobs on them. 🌙💤 → Data stored in CSV (on a distributed system, in 2022!). 📁🚫 → Jobs running one by one that could run in parallel. 🐢➡️🐎 Migrated to Databricks . Converted to Delta Lake. Fixed the cluster config. 🛠️✨ The Result: ✅ 80% cost reduction. ✅ 85% faster processing. Sometimes the best engineering is just paying attention. 🧠💡 What's the most surprising inefficiency you've found in a data platform? 👇 #Databricks #DataEngineering #CloudCost #BigData #ApacheSpark #DeltaLake #CloudOptimization #TorontoTech #TorontoDataEngineer

To view or add a comment, sign in

More Relevant Posts

Smart Data

15,640 followers
3w Edited
Report this post
Most enterprise data platform builds start with the vendor question. AWS or Azure? Snowflake or Databricks? The cloud and the engine are variables. The architecture is the constant. Bronze. Silver. Gold. Source data preserved as-is. Cleaned and conformed across systems. Curated and ready for serving. Runs on any cloud, with any engine, swappable layer by layer. 𝗪𝗼𝗿𝗸𝘀𝗵𝗼𝗽. Inventory the top 3 use cases. Pick one. 𝗣𝗿𝗼𝘃𝗲. Land Bronze. Promote one Gold use case end to end. 𝗦𝗰𝗮𝗹𝗲. Decide the compute engine when scope is real, not before. 90 days to first Gold. Not 12 months to a half-finished warehouse. The architecture stays the same. The platform choices are the variables. None of these decisions are one-way doors. 👉 If your team is six months into a data platform build with no shipped use case, the constraint is usually less about budget and more about sequence. 🔗 More on how we approach it → https://lnkd.in/eX7m8jHP #EnterpriseData #DataEngineering #DataFoundation #SmartData
1 Comment
Like Comment
To view or add a comment, sign in
Robyn Miller
2w Edited
Report this post
Most data platform conversations start in the wrong place. This is a better way to sequence it. #Snowflake #Databricks #enterprisedataplatforms #SmartData
Smart Data

15,640 followers
3w Edited

Most enterprise data platform builds start with the vendor question. AWS or Azure? Snowflake or Databricks? The cloud and the engine are variables. The architecture is the constant. Bronze. Silver. Gold. Source data preserved as-is. Cleaned and conformed across systems. Curated and ready for serving. Runs on any cloud, with any engine, swappable layer by layer. 𝗪𝗼𝗿𝗸𝘀𝗵𝗼𝗽. Inventory the top 3 use cases. Pick one. 𝗣𝗿𝗼𝘃𝗲. Land Bronze. Promote one Gold use case end to end. 𝗦𝗰𝗮𝗹𝗲. Decide the compute engine when scope is real, not before. 90 days to first Gold. Not 12 months to a half-finished warehouse. The architecture stays the same. The platform choices are the variables. None of these decisions are one-way doors. 👉 If your team is six months into a data platform build with no shipped use case, the constraint is usually less about budget and more about sequence. 🔗 More on how we approach it → https://lnkd.in/eX7m8jHP #EnterpriseData #DataEngineering #DataFoundation #SmartData
Like Comment
To view or add a comment, sign in
Furō

2,783 followers
1w
Report this post
Most Databricks teams can see their costs. Explaining them is a different problem entirely. Visibility is having the numbers — DBU reports, cloud invoices, usage dashboards. Most teams have all of that. Explainability is being able to answer the questions those numbers raise: → Which job drove last month's spike? → Which team owns the spend? → Did that cluster optimisation actually reduce total cost — or just shift it from the Databricks bill to the cloud bill? Neither billing system answers those questions on its own. And bridging them takes more than a notebook. Our latest article breaks down exactly why — the structural reason the two systems can't reconcile themselves, and why having the data is not the same as having the answer. Full article linked in the comments → #Databricks #FinOps #DataEngineering #CostObservability #CloudCost #DataPlatform #lumin8
1 Comment
Like Comment
To view or add a comment, sign in
Constantin Lungu

Senior Data Engineer • Contract / Freelancer • Opinions my own
1w
Report this post
Here's a useful #Dataform concept: pre_operations and post_operations. As the name implies, these represent a set of actions that run before and after the main operation (table, view, or #SQL operations). In practice it enables you to cleanly: ➡️ Declare and set a variable ➡️ Clean up before inserting into a table ➡️ Compute a watermark for your incremental model ➡️ Log run metadata after execution ➡️ Perform a maintenance task What's the most interesting use case you've seen for pre_operations / post_operations (or pre_hook / post_hook if you're on #dbt)? P.S. I'll be presenting a quick practical walkthrough of Dataform later today at #BigQueryDay, hosted by Eon.io and Google Cloud — link to free registration in the comments.
8 Comments
Like Comment
To view or add a comment, sign in
Terry Hanks
3w
Report this post
What if you could ask your cloud bill a question and get an answer in one second? This is a live demo — not a mockup. I'm querying a multi-cloud FOCUS 1.2 dataset (AWS, Azure, Oracle) using an MCP server connected to AWS Athena. No dashboard to build. No SQL to write. Just a question and an answer. In 90 seconds you'll see: • Total spend across three clouds • Effective savings rate (spoiler: it's bad) • Most expensive resource identified instantly • Zombie resources burning budget with zero value • A maturity assessment based on the data This is what FinOps looks like when you connect AI to your cost data with proper governance. Read-only credentials. Full audit trail. No new attack surface. IT keeps control. Users get answers. The dashboards aren't going away — they're still the executive view. This fills the gap for every question the dashboard doesn't answer. If your team is spending $20K+/month across clouds and your savings rate is under 15%, let's talk. #FinOps #CloudCost #MCP #AWS

4 Comments
Like Comment
To view or add a comment, sign in
Mure Data

177 followers
1w Edited
Report this post
We’ve started putting together a tiny index of agentic data tools. Skills, MCPs, and CLIs organized across the data stack: Microsoft Fabric, Databricks, Snowflake, dbt, duckdb, and more. Cloud providers too. 👉 GitHub: https://lnkd.in/etHizZPi #datasystems #aisystems
Like Comment
To view or add a comment, sign in
Frankfurt MacMoses O - PhD
1w
Report this post
Most cluster sprawl isn't a cluster problem. It's a conversation that never happened. Walked into a company where every consultant on the data team was spinning up their own Databricks cluster. No shared standards. No capacity plan. Just panic when the bill arrived. The fix wasn't more tooling. It was sitting down with the data platform lead and the cloud architect to model the landing zone as a time-series problem rather than a budget exercise. Full story in the comments. #machinelearning #datascience #databricks #mlops #azure #enterprisearchitect

1 Comment
Like Comment
To view or add a comment, sign in
Vivek Sadh
2w
Report this post
Just read about the new Amazon Redshift RG instances powered by AWS Graviton. I think it is an interesting move by AWS with the new Amazon Redshift RG instances powered by Graviton. The biggest win may not just be performance or cost improvements, but the simpler architecture: 👉 querying data warehouse + data lake workloads using a more integrated engine. Less operational overhead, better Iceberg support, and a cleaner analytics stack is always a good direction. #AWS #Redshift #DataEngineering #Cloud #Analytics
Like Comment
To view or add a comment, sign in
John Hayes
3w
Report this post
SQL Analytics has always been one of my favourite SquaredUp features - it enables you to combine data from multiple sources and run queries over them as if they were SQL tables. SQL Analytics now ships with our SmartAssist AI assistant - so you don't even need to know any SQL to build queries - just describe the outcomes you need: 👉 "Show me my combined cloud spend across AWS and Azure." 👉 "Now format that to two decimal places." 👉 "Break it down by week and show it as a column chart." I have written a walkthrough in this blog article: https://lnkd.in/ettFY_Jq
Like Comment
To view or add a comment, sign in
Subiya Tabassum
2w
Report this post
"Started exploring Snowflake today — a cloud data warehouse that's widely used in modern data engineering! Key thing I learned — unlike traditional databases, Snowflake separates storage and compute. This means teams can query data independently without slowing each other down! #DataEngineering #Snowflake #LearningInPublic #CloudData"
Like Comment
To view or add a comment, sign in

1,140 followers

32 Posts

View Profile Follow

Utkarsha Borikar’s Post

More Relevant Posts

Explore related topics

Explore content categories