Mure Data’s Post

177 followers

1w Edited

We’ve started putting together a tiny index of agentic data tools. Skills, MCPs, and CLIs organized across the data stack: Microsoft Fabric, Databricks, Snowflake, dbt, duckdb, and more. Cloud providers too. 👉 GitHub: https://lnkd.in/etHizZPi #datasystems #aisystems

To view or add a comment, sign in

More Relevant Posts

John Hayes
3w
Report this post
SQL Analytics has always been one of my favourite SquaredUp features - it enables you to combine data from multiple sources and run queries over them as if they were SQL tables. SQL Analytics now ships with our SmartAssist AI assistant - so you don't even need to know any SQL to build queries - just describe the outcomes you need: 👉 "Show me my combined cloud spend across AWS and Azure." 👉 "Now format that to two decimal places." 👉 "Break it down by week and show it as a column chart." I have written a walkthrough in this blog article: https://lnkd.in/ettFY_Jq
Like Comment
To view or add a comment, sign in
DQLabs

16,516 followers
3w
Report this post
Snowflake is not a database. It is a cloud data platform with elastic compute, credit-based pricing, and an expanding ecosystem of dbt, Snowpipe, Tasks, and Streams. So why do most observability tools still monitor it like it is Postgres? #PRIZM was built from the metadata up to understand how Snowflake actually works. Criticality-aware profiling means your CFO's revenue table gets deep checks. The forgotten staging table from 2019 gets nothing. Your credit bill stays predictable even as your catalog grows into tens of thousands of assets. We broke down the 7 layers you actually need to observe and the 10 capabilities to demand before you sign with any vendor. https://lnkd.in/djYSuUpj #DataObservability #Snowflake #DataQuality #AIReadyData #DataEngineering #PRIZM
Like Comment
To view or add a comment, sign in
Furō

2,783 followers
1w
Report this post
Most Databricks teams can see their costs. Explaining them is a different problem entirely. Visibility is having the numbers — DBU reports, cloud invoices, usage dashboards. Most teams have all of that. Explainability is being able to answer the questions those numbers raise: → Which job drove last month's spike? → Which team owns the spend? → Did that cluster optimisation actually reduce total cost — or just shift it from the Databricks bill to the cloud bill? Neither billing system answers those questions on its own. And bridging them takes more than a notebook. Our latest article breaks down exactly why — the structural reason the two systems can't reconcile themselves, and why having the data is not the same as having the answer. Full article linked in the comments → #Databricks #FinOps #DataEngineering #CostObservability #CloudCost #DataPlatform #lumin8
1 Comment
Like Comment
To view or add a comment, sign in
Raj Praneeth
3w
Report this post
I have built data platforms on AWS, Azure, and GCP. Here is the truth nobody puts in a blog post: The cloud does not matter as much as you think. I have seen beautiful AWS architectures that nobody trusted. I have seen a single Snowflake table on GCP that ran a $2B business unit's reporting. I have seen Azure Databricks pipelines that were masterpieces of engineering and took 8 months to deliver zero business value. The platform is never the problem. The problem is almost always one of these three things: Nobody agreed on what the data should mean before building the pipeline. The people consuming the data were never involved in designing it. The team optimized for technical elegance instead of business outcomes. I have made all three mistakes. On all three clouds. The engineers who consistently deliver are not the ones with the most certifications or the most impressive stack. They are the ones who spend the first week asking business questions instead of writing code. The cloud is just infrastructure. Judgment is the actual skill. Which mistake have you made most often? Be honest. #DataEngineering #AWS #Azure #GCP #BigData #TechCareers
Like Comment
To view or add a comment, sign in
Nick Treurnicht
3w
Report this post
The new Knowledge Catalog from Google Cloud looks very interesting and seems to attemtpt to solve the issue around context and agents. Much around AI these days seems to be revolving more around this particular issue, rather than using model number 3.whatever https://lnkd.in/gR2rHzK8 Getting charged in "per DCU-hour" however ... even the documentation admits it's "an abstract billing unit"

Knowledge Catalog (formerly Dataplex) cloud.google.com
Like Comment
To view or add a comment, sign in
Utkarsha Borikar
3w Edited
Report this post
We cut cloud costs by 80%!! 💸📉 The surprising part? The biggest saving wasn't from better code. 🙅♀️ It was from reading the bill properly. 📑🔍 We found: → Clusters running all night with no jobs on them. 🌙💤 → Data stored in CSV (on a distributed system, in 2022!). 📁🚫 → Jobs running one by one that could run in parallel. 🐢➡️🐎 Migrated to Databricks . Converted to Delta Lake. Fixed the cluster config. 🛠️✨ The Result: ✅ 80% cost reduction. ✅ 85% faster processing. Sometimes the best engineering is just paying attention. 🧠💡 What's the most surprising inefficiency you've found in a data platform? 👇 #Databricks #DataEngineering #CloudCost #BigData #ApacheSpark #DeltaLake #CloudOptimization #TorontoTech #TorontoDataEngineer
Like Comment
To view or add a comment, sign in
Carmella (Surdyk) Weatherill
1w
Report this post
If your team is dealing with fragmented data environments, runaway BigQuery costs, or pipelines that underperform at scale, this session is worth your time. On May 29, Jellyfish Training (Google's North American Cloud Training Partner of the Year) is running a full-day virtual session: 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗶𝗻𝗴 𝗔𝗴𝗲𝗻𝘁𝗶𝗰 𝗗𝗮𝘁𝗮 𝗪𝗼𝗿𝗸𝗹𝗼𝗮𝗱𝘀 𝗼𝗻 Google Cloud. Built for data architects and engineers, the day covers Knowledge Catalog (formerly Dataplex) and Data Mesh architecture, BigQuery workload management and pricing, Dataflow and batch pipeline tuning, and a FinOps module on budgets and alerting. Two hands-on labs are built into the schedule. Participants who complete the training earn a digital Credly "Enterprise Data Efficiency" badge. 📅 May 29, 2026 | 9:00 AM – 5:00 PM CDT | Virtual Participation is limited to keep the experience hands-on and high-impact. 🔗 Register here ➡ https://lnkd.in/e_H6UAJu #GoogleCloud #DataEngineering

Optimizing Agentic Data Workloads on Google Cloud google.smh.re
Like Comment
To view or add a comment, sign in
Alexandre Silva (Xambão)
1w
Report this post
If your team is dealing with fragmented data environments, runaway BigQuery costs, or pipelines that underperform at scale, this session is worth your time. On May 29, Jellyfish Training (Google's North American Cloud Training Partner of the Year) is running a full-day virtual session: 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗶𝗻𝗴 𝗔𝗴𝗲𝗻𝘁𝗶𝗰 𝗗𝗮𝘁𝗮 𝗪𝗼𝗿𝗸𝗹𝗼𝗮𝗱𝘀 𝗼𝗻 Google Cloud. Built for data architects and engineers, the day covers Knowledge Catalog (formerly Dataplex) and Data Mesh architecture, BigQuery workload management and pricing, Dataflow and batch pipeline tuning, and a FinOps module on budgets and alerting. Two hands-on labs are built into the schedule. Participants who complete the training earn a digital Credly "Enterprise Data Efficiency" badge. 📅 May 29, 2026 | 9:00 AM – 5:00 PM CDT | Virtual Participation is limited to keep the experience hands-on and high-impact. 🔗 Register here ➡ https://lnkd.in/dUsEpXcW #GoogleCloud #DataEngineering

Optimizing Agentic Data Workloads on Google Cloud google.smh.re
Like Comment
To view or add a comment, sign in
Katherine Jones
6d
Report this post
If your team is dealing with fragmented data environments, runaway BigQuery costs, or pipelines that underperform at scale, this session is worth your time. On May 29, Jellyfish Training (Google's North American Cloud Training Partner of the Year) is running a full-day virtual session: 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗶𝗻𝗴 𝗔𝗴𝗲𝗻𝘁𝗶𝗰 𝗗𝗮𝘁𝗮 𝗪𝗼𝗿𝗸𝗹𝗼𝗮𝗱𝘀 𝗼𝗻 Google Cloud. Built for data architects and engineers, the day covers Knowledge Catalog (formerly Dataplex) and Data Mesh architecture, BigQuery workload management and pricing, Dataflow and batch pipeline tuning, and a FinOps module on budgets and alerting. Two hands-on labs are built into the schedule. Participants who complete the training earn a digital Credly "Enterprise Data Efficiency" badge. 📅 May 29, 2026 | 9:00 AM – 5:00 PM CDT | Virtual Participation is limited to keep the experience hands-on and high-impact. 🔗 Register here ➡ https://lnkd.in/e-pE6jgJ #GoogleCloud #DataEngineering

Optimizing Agentic Data Workloads on Google Cloud google.smh.re
Like Comment
To view or add a comment, sign in
Ivani Piauilino
1w
Report this post
If your team is dealing with fragmented data environments, runaway BigQuery costs, or pipelines that underperform at scale, this session is worth your time. On May 29, Jellyfish Training (Google's North American Cloud Training Partner of the Year) is running a full-day virtual session: 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗶𝗻𝗴 𝗔𝗴𝗲𝗻𝘁𝗶𝗰 𝗗𝗮𝘁𝗮 𝗪𝗼𝗿𝗸𝗹𝗼𝗮𝗱𝘀 𝗼𝗻 Google Cloud. Built for data architects and engineers, the day covers Knowledge Catalog (formerly Dataplex) and Data Mesh architecture, BigQuery workload management and pricing, Dataflow and batch pipeline tuning, and a FinOps module on budgets and alerting. Two hands-on labs are built into the schedule. Participants who complete the training earn a digital Credly "Enterprise Data Efficiency" badge. 📅 May 29, 2026 | 9:00 AM – 5:00 PM CDT | Virtual Participation is limited to keep the experience hands-on and high-impact. 🔗 Register here ➡ https://lnkd.in/d_yk5CV3 #GoogleCloud #DataEngineering

Optimizing Agentic Data Workloads on Google Cloud google.smh.re
Like Comment
To view or add a comment, sign in

177 followers

View Profile Connect

Mure Data’s Post

More from this author

The MURE LOG #9: context is the product

Explore content categories

Mure Data’s Post

More Relevant Posts

More from this author

The MURE LOG #9: context is the product

Explore related topics

Explore content categories