A significant advancement in data platform architecture has emerged from Databricks. The introduction of Unity Catalog Open APIs enables organizations to allow external compute engines, such as Apache Spark, Flink, DuckDB, and Trino, to directly create, read, and write to Unity Catalog Managed Delta Tables. This is achieved while maintaining full centralized governance, auditing, and fine-grained access control. This development eliminates the long-standing trade-off between governance and flexibility. Now, there is a single copy of data that is governed centrally and accessible from any engine. This is a game-changer for enterprises operating in multi-engine environments. This advancement represents a big win for reducing data duplication, lowering costs, and simplifying compliance. What are your thoughts on this direction for open data platforms? #Databricks #UnityCatalog #DataGovernance #DeltaLake #DataEngineering #AIPlatform #Database #Lakehouse #MicrosoftFabric #Microsoft #Deltalake #OpenAPI #DuckDB
Databricks Unity Catalog Open APIs Simplify Data Governance
More Relevant Posts
-
🚨 Databricks Users — A Major Change is Coming! 📅 Mark your calendars: September 30, 2026 Azure Databricks is officially moving to a Unity Catalog-only model for all new deployments. This means the traditional setup many teams still use today will soon become legacy architecture ⚠️ 🔥 What’s Changing? ✅ New workspaces → No Hive Metastore ✅ New workspaces → No DBFS root or mounts ✅ New workspaces → Minimum runtime = 13.3 LTS 💡 What does this mean? Databricks is pushing toward a more: ✔ Secure ✔ Governed ✔ Centralized ✔ Scalable data platform using Unity Catalog as the standard governance layer. ⚠️ If you're still using: Hive Metastore DBFS mounts Legacy access patterns …it’s the right time to start planning your Unity Catalog migration. 🚀 Why this matters for Data Engineers Unity Catalog brings: 🔹 Centralized governance 🔹 Fine-grained access control 🔹 Better auditing & lineage 🔹 Cross-workspace data sharing 🔹 Improved security standards 💬 The Databricks ecosystem is evolving fast, and understanding Unity Catalog is quickly becoming a must-have skill for modern Data Engineers. Have you started your Unity Catalog migration yet? 👇 #Databricks #AzureDatabricks #UnityCatalog #DataEngineering #BigData #DataGovernance #DeltaLake #CloudComputing #DataArchitecture #Spark #TechUpdates #DataPlatform
To view or add a comment, sign in
-
If you want your data assets in Databricks to be accessed only by the people you choose, then Unity Catalog is exactly what you need. If you have worked with Databricks, you probably already know how powerful Unity Catalog is for data governance and access control. I recently built a notebook that covers the major components and interactions with Unity Catalog, including: Tables, Volumes, Views, Functions, External locations, Credentials and more It is a practical walkthrough to help understand how everything connects in real use cases. Check out the doc, and if you want to download the notebook from the repo, the link is in the comments. #Databricks #UnityCatalog #DataEngineering #DataGovernance #BigData #DataPlatform #Lakehouse #DataArchitecture #CloudComputing #AzureDatabricks #AWS #GCP #MachineLearning #ArtificialIntelligence #DataScience #Analytics #DataSecurity #AccessControl #DataAccess #DataManagement #MetadataManagement #DataOps #MLOps #ETL #ELT #DataPipelines #ModernDataStack #SQL #ApacheSpark #DeltaLake #DataCatalog #DataLake #CloudData #EnterpriseData #TechCommunity #AIEngineering #BackendEngineering #DataSolutions #SoftwareEngineering #OpenSource #TechLearning
To view or add a comment, sign in
-
For years, "open lakehouse" meant your catalog was open — your *managed tables* were a different story. Databricks just shipped three Unity Catalog interoperability pieces that change that posture. 🔧 External engines — Apache Spark, Flink, DuckDB — can now create, read, and write to UC managed Delta tables, including streaming sources and sinks. This is Beta, and it's built on Delta Lake's catalog commits standard, so concurrent writes from outside Databricks are serialized through the catalog rather than competing on raw storage. Delta-Spark 4.2 and Unity Catalog 0.4.1 are the floor; the protocol abstraction lives in Delta Kernel. Credential vending hit GA. That's machine-to-machine OAuth with short-lived, scoped credentials auto-refreshed for long-running pipelines — the thing every platform team has been trying to bolt onto static tokens or PATs. Volume credential vending — same model, but for unstructured data in volumes (PDFs, images, video) — went Public Preview at the same time. The practical shift for an architect: you stop maintaining a parallel set of permissions for every external engine. One copy of the data, one policy layer, scoped temporary credentials per workload. Lineage stays in UC. The PepsiCo callout in the post hints at the migration story — fewer dataset duplications, fewer per-tool access models. Caveat worth noting: ABAC for fine-grained external reads is still "under development," not shipped. If you need row-level or column-level policies enforced on a Flink or DuckDB read path today, you're not there yet. What's the first non-Databricks engine you'd point at a UC managed table — Flink, DuckDB, or something else? Source: https://lnkd.in/eySppXgA #Databricks #UnityCatalog #Lakehouse #DeltaLake #OpenLakehouse #DataEngineering #DataPlatform #DataArchitecture #DataGovernance #PlatformEngineering #ModernDataStack #DataEngineers #SolutionArchitect
To view or add a comment, sign in
-
-
#MicrosoftFabric #Spark Native Execution Engine is emerging as an amazingly performant and often surprisingly cheap solution for data engineering at any scale.
Microsoft Data Platform MVP | Data Solutions Architect at Seequality | Expert in Azure, Big Data, BI & Data Architecture | Enabling Data-Driven Innovation
If you’re running #Spark on #MicrosoftFabric, you’re sitting on a massive performance advantage thanks to the Native Execution Engine. This represents a fundamental architectural shift that replaces traditional JVM-based execution with a vectorized C++ path powered by #ApacheGluten and #Velox. The result is a system that handles data engineering workloads with incredible speed. By processing Parquet and Delta datasets more efficiently, you get significantly more value out of your Fabric Capacity Units. Because this is built directly into the infrastructure, you enjoy faster query times and higher throughput for your ETL pipelines without rewriting a single line of Spark code. To verify if the engine is active, check the query plan in the Spark UI. Look for node names ending in the suffix Transformer, NativeFileScan, or VeloxColumnarToRowExec. These suffixes confirm that the Native Execution Engine successfully processed the operation.
To view or add a comment, sign in
-
-
If you’re running #Spark on #MicrosoftFabric, you’re sitting on a massive performance advantage thanks to the Native Execution Engine. This represents a fundamental architectural shift that replaces traditional JVM-based execution with a vectorized C++ path powered by #ApacheGluten and #Velox. The result is a system that handles data engineering workloads with incredible speed. By processing Parquet and Delta datasets more efficiently, you get significantly more value out of your Fabric Capacity Units. Because this is built directly into the infrastructure, you enjoy faster query times and higher throughput for your ETL pipelines without rewriting a single line of Spark code. To verify if the engine is active, check the query plan in the Spark UI. Look for node names ending in the suffix Transformer, NativeFileScan, or VeloxColumnarToRowExec. These suffixes confirm that the Native Execution Engine successfully processed the operation.
To view or add a comment, sign in
-
-
Databricks Delta Lake MERGE — small syntax, big impact It’s the backbone of idempotent, CDC-driven pipelines in Databricks. 🔹 Design for Idempotency Use stable keys and deterministic logic so reprocessing doesn’t create duplicates. 🔹 Handle Full CDC (I/U/D) Drive logic using operation flags instead of overwriting snapshots. MERGE INTO target t USING source s ON t.id = s.id WHEN MATCHED AND s.op = 'D' THEN DELETE WHEN MATCHED AND s.updated_at > t.updated_at THEN UPDATE SET * WHEN NOT MATCHED THEN INSERT * 🔹 Performance Matters - Z-ORDER on join keys - OPTIMIZE to reduce small files - Filter source to limit target scans 🔹 Concurrency Awareness Delta uses optimistic concurrency—parallel writes can conflict, so design retries carefully. "MERGE" isn’t just an operation—it’s your data contract for correctness at scale. #Databricks #DeltaLake #DataEngineering #Lakehouse
To view or add a comment, sign in
-
-
Before Unity Catalog, Databricks governance was a mess nobody liked to admit. Every workspace had its own Hive metastore. Access controls were inconsistent across workspaces. Lineage tracking was either manual or nonexistent. Data sharing between teams meant copying datasets and hoping nothing drifted. Unity Catalog changed all of that with one idea: one metastore to govern everything. Here is what actually changed. Before, a table in workspace A was invisible to workspace B. Now catalogs, schemas, and tables are defined once and visible across every workspace in your account. Your data engineers, data scientists, and analysts all operate from the same hierarchy. Access control went from coarse and inconsistent to column-level and row-level, enforced uniformly across Spark, SQL, and even the REST API. You define it once in Unity Catalog. It applies everywhere. Lineage became automatic. Every read, every write, every transformation is tracked end to end without any manual tagging. You can trace a dashboard metric back to the raw source table in a few clicks. Delta Sharing lets you share live data across clouds and organizations without copying anything. The recipient queries directly from your catalog. Nothing moves. And all of it sits under one metastore, one audit log, one place to manage everything. If you are still running workspace-level Hive metastores in 2026, the migration is worth it. Unity Catalog is not just a governance upgrade. It is a platform maturity upgrade. Are you already on Unity Catalog? What did the migration look like for your team? #Databricks #UnityCatalog #DataEngineering #DataGovernance #DeltaLake #BigData #DataArchitecture
To view or add a comment, sign in
-
-
Two years ago, Databricks and Snowflake couldn't agree on a table format. Today they both ship Iceberg v3 with deletion vectors, row lineage, and VARIANT support. Snowflake and Microsoft Fabric announced full bi-directional interoperability. Databricks Unity Catalog federates to every major Iceberg catalog. The format wars are over. Openness won. but watch what's happening one layer up.. Databricks wants Unity Catalog to be the federated governance plane across all your Iceberg data. Snowflake launched Open Semantic Interchange with 35+ industry partners to define how metadata and semantics travel between platforms. Same pattern we saw with compute a decade ago. Once the storage layer commoditized, the real fight moved to the orchestration and governance layer above it. For data teams, this is the question that matters now: format portability is solved. Catalog portability is not. Your Iceberg tables can live anywhere, but whoever owns the catalog, the lineage, the access policies, and the semantic definitions still owns the switching cost. The format wars ended with a handshake. The control plane wars are just getting started. Where is your team placing its bets on the governance layer? #ApacheIceberg #DataGovernance #DataStrategy
To view or add a comment, sign in
-
-
AI agents need memory. Most teams solve this by plugging in a separate database. That separate database then becomes a security and compliance problem. This library keeps the memory inside Databricks, where everything already lives. For builders: less time fighting your platform team, more time shipping. For users: agents that actually remember context, without the data governance risk.
Memory is the missing Databricks layer. So we shipped it. pip install lakehouse-memory is live on PyPI today. Every team building AI agents on Databricks eventually does the same thing: bolt on a sidecar vector DB. Pinecone. Weaviate. Pick your flavor. Each one brings its own governance, its own access control, its own lineage story. A parallel data plane you can't actually ship to production. Your platform team knows it. Your security team knows it. The launch slips. Memory shouldn't be a second system. It belongs where your data already lives. lakehouse-memory gives agents three first-class memory primitives (episodic, semantic, and working) backed by Unity Catalog tables and Databricks Vector Search. UC permissions just work. Lineage just works. One governance model, end to end. Drop-in LangChain integration. Apache 2.0. Alpha, but public from day one, because the only way this gets right is in the open. → https://lnkd.in/gUkTJN98 If you're building agents on Databricks and tired of explaining to your platform team why there's a Pinecone line item, I'd love your feedback. #Databricks #UnityCatalog #AIAgents #LLM #OpenSource
To view or add a comment, sign in
-
-
🚀 𝗗𝗮𝘁𝗮𝗯𝗿𝗶𝗰𝗸𝘀 𝗗𝗲𝗹𝘁𝗮 𝗟𝗮𝗸𝗲’𝘀 𝗡𝗲𝘅𝘁 𝗘𝘃𝗼𝗹𝘂𝘁𝗶𝗼𝗻: Understanding Catalog Managed Commits (CMC) The future of Delta Lake is shifting from filesystem-managed transactions to catalog-managed intelligence. With the introduction of Catalog Managed Commits (CMC) in Databricks, Unity Catalog is no longer just a governance layer — it becomes the transaction coordinator for Delta tables. 𝗛𝗲𝗿𝗲’𝘀 𝘄𝗵𝘆 𝘁𝗵𝗶𝘀 𝗺𝗮𝘁𝘁𝗲𝗿𝘀 👇 ✅ 𝗧𝗿𝗮𝗱𝗶𝘁𝗶𝗼𝗻𝗮𝗹 𝗗𝗲𝗹𝘁𝗮 𝗟𝗮𝗸𝗲 • Each table manages its own transaction log • Commit coordination happens at the filesystem level • Conflict detection is table-specific ✅ 𝗪𝗶𝘁𝗵 𝗖𝗮𝘁𝗮𝗹𝗼𝗴 𝗠𝗮𝗻𝗮𝗴𝗲𝗱 𝗖𝗼𝗺𝗺𝗶𝘁𝘀 (𝗖𝗠𝗖) • Unity Catalog becomes the single source of truth • Commit coordination shifts from storage → catalog • Enables multi-table ACID transactions • Improves metadata handling & query planning • Strengthens governance across engines 💡 𝗞𝗲𝘆 𝗕𝗲𝗻𝗲𝗳𝗶𝘁𝘀: 🔹 Multi-table atomic transactions 🔹 Faster query planning 🔹 Better concurrency handling 🔹 Stronger governance enforcement 🔹 Cross-engine interoperability 🔹 Reduced metadata latency This is a major architectural evolution for modern Lakehouse systems. ⚠️ 𝗥𝗲𝗾𝘂𝗶𝗿𝗲𝗺𝗲𝗻𝘁𝘀: • Unity Catalog Managed Tables • Databricks Runtime 16.4+ • Runtime 18+ to enable/disable on existing tables 📈 Why It Matters for Data Engineers: 𝗖𝗠𝗖 𝗶𝘀 𝗽𝗮𝘃𝗶𝗻𝗴 𝘁𝗵𝗲 𝘄𝗮𝘆 𝗳𝗼𝗿: • enterprise-grade lakehouse transactions • open catalog interoperability • external engine write support • future-ready distributed governance 𝗧𝗵𝗲 𝗟𝗮𝗸𝗲𝗵𝗼𝘂𝘀𝗲 𝗮𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲 𝗶𝘀 𝗲𝘃𝗼𝗹𝘃𝗶𝗻𝗴 𝗳𝗿𝗼𝗺: ➡️ “𝗦𝘁𝗼𝗿𝗮𝗴𝗲-𝗰𝗲𝗻𝘁𝗿𝗶𝗰” to ➡️ “𝗖𝗮𝘁𝗮𝗹𝗼𝗴-𝗰𝗲𝗻𝘁𝗿𝗶𝗰” And this changes everything. #Databricks #DeltaLake #UnityCatalog #DataEngineering #Lakehouse #BigData #ApacheSpark #AzureDataEngineer #DataArchitecture #CloudComputing #ETL #ELT #AnalyticsEngineering #MicrosoftFabric #DataPlatform
To view or add a comment, sign in
-