Every data engineer has that one pipeline. The one that "just works" but nobody really understands anymore. The one that breaks at 2am. A lot of this week's Databricks updates are quietly aimed at killing exactly that kind of setup 🔥 . No more duct-taping your OLTP and OLAP pipelines together. No more "which runtime version was stable again?" New episode of the Release Notes Podcast is out! 🎙️ Here's what landed: 💠 Lakehouse Sync bridges your operational Lakebase database directly to Delta Lake via CDC — finally, one platform for both transactional and analytical workloads, no custom ingestion layer needed. 💠 Runtime versioning gets a makeover from v19 onwards: one version, auto-promoted to GA, 3-year LTS. Simpler, more predictable, easier to govern. 💠 Automatic Identity Management with Entra ID and Okta means onboarding a business user to a Genie Space is now a one-click operation instead of a SCIM configuration nightmare. 💠 Genie gets smarter — you can now define workspace-level instructions so your AI assistant actually speaks your company's language across every chat. 👉 On top of that: new managed connectors (HubSpot GA, Outlook/GitHub/Smartsheet in beta), uv pip support in serverless notebooks for faster cold starts, Qwen 3.5 122B available as a hosted model, and Catalog Managed Commits hitting GA for proper multi-table transaction governance. A lot moved from preview to production this cycle. ➡️ Worth a listen 🎙️: https://lnkd.in/dXd8FPRE #Databricks #DataEngineering #DataPlatform #ReleaseNotes
Databricks Updates Simplify Data Pipelines and Governance
More Relevant Posts
-
🎙️ New episode of Release Notes is out — and this week, it's all about Databricks going all-in on #Genie. This week Benedek Kiss went solo to break down the latest Databricks release, Gábor Tompa is out, but don't worry, he'll be back next episode! Databricks One has been renamed to Genie. And yes, the naming can get a bit confusing, but it actually makes sense once you see the full picture: · Genie = the entire conversational analytics ecosystem (formerly Databricks One) · Genie Spaces = the domain-specific data chat experience (formerly just "Genie") · Genie Code = the AI coding assistant (formerly Databricks Assistant) Everything is Genie. 🧞 Beyond the rebranding, there's a lot of substance in this release. Here are the highlights: 💻 Genie Code builds your Genie Space for you Ask it to generate instructions, SQL expressions, joins - it inspects your data and does the setup. Like prompting an AI to write its own system prompt. 🗓️ Scheduled tasks via natural language "Send me a flight delay report every morning at 8 AM" - that's the whole setup. Genie creates it, runs it, emails you. No dashboards, no cron jobs. 🤖 GPT-5.5 & 5.5 Pro on Databricks - same day availability OpenAI ships it on April 23, Databricks has it on April 24. No delay between frontier model releases and your workflows. ✳️ Supervisor agents just got a lot more composable Custom MCP servers, sub-agents via Databricks Apps, SDK support - the agent framework is quietly becoming very powerful. 🔌 A bunch of GAs including Lakeflow pipeline editor Code, graph view, and related files in one place. Workspace base environments, Confluence & Zendesk connectors, ABAC all hit GA. ⤵️ Links to release notes and related sites are in the video description: https://lnkd.in/gSAA5KNq #Databricks #DataEngineering #DataPlatform #ReleaseNotes
Databricks - Release Notes, Decoded #6 - 11/05/2026
https://www.youtube.com/
To view or add a comment, sign in
-
More test data is moving into Databricks. But for many teams, masking that data still means exporting it first. That creates delays, extra pipelines, engineering dependencies and unnecessary compliance risk. DATPROF now supports Databricks, so teams can mask and generate test data directly inside Delta Tables. No exports. No intermediate storage. No complex workarounds. Just secure, compliant test data where your lakehouse data already lives. This helps teams speed up testing, reduce risk and work more independently in modern data platforms. Read the full release here: https://lnkd.in/dcS83Qfk #Databricks #TestDataManagement #DataCompliance #DataPrivacy
To view or add a comment, sign in
-
-
One thing I’ve learned working with Databricks: fast pipelines don’t always mean reliable data. Most issues are not job failures. They’re small things that slowly impact trust in data: → duplicate records after retries → schema drift breaking downstream tables → missed updates in incremental loads → inconsistent business logic across notebooks The pipeline is “green,” but the data still has problems. What helped me most: ✔ idempotent transformations ✔ validation checks during processing ✔ monitoring Delta tables proactively ✔ keeping governance and lineage in place with Unity Catalog Data engineering is not just about scaling pipelines. It’s about making data dependable. #Databricks #DataEngineering #DeltaLake #PySpark #AWS
To view or add a comment, sign in
-
Test data management in Databricks used to come with a lot of friction. If sensitive data lived in Delta Tables, teams often had to export it, mask it elsewhere, and then bring it back into the testing process. That creates delays. It creates dependencies. And it creates extra risk. With DATPROF’s new Databricks support, masking can now run directly inside delta tables. What I like about this release is that it also addresses an important Databricks-specific challenge: Delta Table history. Masking only the latest version is not enough if older, unmasked versions remain accessible. DATPROF handles Delta Table versioning automatically, helping teams prevent sensitive data from remaining available in historical versions. For test teams working in Databricks, this makes secure and compliant testing much more practical. Read the full release here: https://lnkd.in/dhUJmVQX #Databricks #DeltaTables #TestDataManagement #DataCompliance
To view or add a comment, sign in
-
💡 Taking my data engineering skills to the next level — starting with Data Ingestion with Lakeflow Connect on the Databricks Learning Path. Lakeflow Connect is Databricks' managed ingestion service that simplifies bringing data from external sources — including SaaS apps, databases, and file systems — directly into the Lakehouse, without building or maintaining complex pipelines. Key takeaways from the course: ✅ How to configure and manage ingestion pipelines through the Databricks UI ✅ Understanding Lakeflow's connector ecosystem and how it handles schema evolution automatically ✅ Best practices for landing raw data reliably into Delta Lake This is just the beginning of the learning path, and I'm excited to keep building toward a solid foundation in modern data engineering. If you're exploring the Databricks ecosystem, I'd highly recommend starting here — the hands-on approach makes the concepts click fast. #Databricks #DataEngineering #LakeflowConnect #DeltaLake #DataLakehouse #LearningAndDevelopment
To view or add a comment, sign in
-
Databricks is becoming a central platform for test teams. But as more production-like data moves into the lakehouse, one question becomes more important: How do you keep testing realistic without exposing sensitive data? This is why DATPROF now supports Databricks. Test teams can mask and generate test data directly inside Databricks Delta Tables. No exports. No unnecessary copies. No waiting on engineering teams for every test cycle. For organizations working with lakehouse architectures, secure test data management should happen where the data already lives. This helps teams move faster, reduce compliance risk and keep sensitive data protected across modern data platforms. Link to the full release in the comments. #Databricks #TestDataManagement #DataPrivacy #DATPROF
To view or add a comment, sign in
-
-
Nobody tells you this about Databricks. The platform was built for enterprise teams running production workloads at scale. The interface shows you everything because someone at a Fortune 500 company needs quick access to everything. But you are not that person yet. You do not need Jobs on day one. You do not need Workflows on day one. You do not need Machine Learning on day one. You do not need Partner Connect on day one. You do not need Repos on day one. You need a notebook and a SQL query. That is your entire first week. The engineers who learn fastest are not the ones who explore every menu item. They are the ones who ignore 80% of the sidebar and go deep on the 20% that matters. Workspace. Catalog. SQL Editor. Everything else earns its place when you are ready for it. Not before. #Databricks #DataEngineering #BricksNotes
To view or add a comment, sign in
-
-
🚀 Optimizing Massive File Processing in Databricks at Scale Recently worked on a real-time big data processing use case where we handled a single massive file containing billions of records in Databricks. The biggest challenge was improving execution performance while reducing shuffle, memory spill, small file generation, and slow write operations. By applying smart partitioning strategies, Delta optimizations, Photon runtime, and efficient batch writes, we significantly reduced execution time from 10+ hours to around 2–3 hours. ⚡ Key learnings from this optimization journey: ✅ Proper partitioning improves distributed processing performance ✅ Early filtering & column pruning reduce shuffle overhead ✅ Delta Optimize + Auto Compaction help avoid small file issues ✅ Spark UI is extremely useful for identifying bottlenecks ✅ Efficient write strategy is critical at billion-row scale #Databricks #PySpark #BigData #DataEngineering #DeltaLake #Azure #ETL #PerformanceOptimization #DataPipeline #CloudComputing #ApacheSpark
To view or add a comment, sign in
-
-
If you want your data assets in Databricks to be accessed only by the people you choose, then Unity Catalog is exactly what you need. If you have worked with Databricks, you probably already know how powerful Unity Catalog is for data governance and access control. I recently built a notebook that covers the major components and interactions with Unity Catalog, including: Tables, Volumes, Views, Functions, External locations, Credentials and more It is a practical walkthrough to help understand how everything connects in real use cases. Check out the doc, and if you want to download the notebook from the repo, the link is in the comments. #Databricks #UnityCatalog #DataEngineering #DataGovernance #BigData #DataPlatform #Lakehouse #DataArchitecture #CloudComputing #AzureDatabricks #AWS #GCP #MachineLearning #ArtificialIntelligence #DataScience #Analytics #DataSecurity #AccessControl #DataAccess #DataManagement #MetadataManagement #DataOps #MLOps #ETL #ELT #DataPipelines #ModernDataStack #SQL #ApacheSpark #DeltaLake #DataCatalog #DataLake #CloudData #EnterpriseData #TechCommunity #AIEngineering #BackendEngineering #DataSolutions #SoftwareEngineering #OpenSource #TechLearning
To view or add a comment, sign in
-
A few years ago, I thought Data Engineering was just about writing PySpark code and moving data from one place to another. Then I worked on a real production project… Pipelines failing, duplicate data, dashboards breaking. That’s when I realized Modern Data Engineering is not just moving data. It’s about building scalable, reliable, real-time systems. A typical modern pipeline looks like this: 📌 Event Hubs → ADLS Gen2 → Databricks 📌 Bronze → Silver → Gold layers 📌 Transformations using PySpark & Delta Lake 📌 Orchestration using Azure Data Factory 📌 Analytics-ready data for reporting & dashboards The biggest lesson? Learning tools is important. But understanding architecture is what actually makes you a strong Data Engineer. 💡 Tech Stack: Databricks • PySpark • Delta Lake • ADF • ADLS Gen2 • Event Hubs If you want to learn how to build end-to-end Data Engineering projects with real industry use cases, let’s connect. I’ll guide you step-by-step on how modern pipelines are actually built in production using Databricks, PySpark, Azure & Delta Lake. Connect with me on Topmate: https://lnkd.in/gnhqTtZJ #DataEngineering #Databricks #PySpark #Azure #DeltaLake #BigData
To view or add a comment, sign in
-