Bartosz Gajda’s Post

Do you want to build advanced observability system for your Databricks platform? You have to check this blog post, with complete guide on how to do it 👇 Key Highlights: • 📊 Queryable telemetry from jobs, pipelines, and clusters in system tables • 🔍 Identify jobs that produce unused data to reduce cost • ⏱ Detect missing timeouts and long runtimes before SLA breaches • 🛡 Spot legacy runtime versions for security and performance • 👥 Pull job owners instantly for faster remediation I've been using the new System Tables for Lakeflow jobs to turn platform telemetry into a single source of truth. Because the tables are read‑only and live in the system catalog, I can write ordinary SQL to pull job configs, task timelines, cost attribution and lineage. With a few joins I can surface jobs that write tables nobody reads, flag runs that exceed expected duration, and list tasks still on deprecated runtimes. The results feed directly into the Lakeflow dashboard template, giving the whole engineering team a shared view of reliability, cost and hygiene. Since everything is queryable, audits and alerts become repeatable queries rather than ad‑hoc scripts. In practice this has reduced the time spent chasing 3 AM alerts and made ownership clear for every pipeline. Which observability signals have helped you catch issues before they impact SLAs? Tutorial - https://lnkd.in/dQwYciy7 #DatabricksAdministration #DataObservability #Lakeflow

  • graphical user interface, application

To view or add a comment, sign in

Explore content categories