Ezz-Eldin S.’s Post

I’ll admit it: I used to be a die-hard “External Tables only” person. If you started with Databricks a few years ago, that was the golden rule: Managed tables felt risky — drop the table, data vanishes into the DBFS void. Not production-ready. But if you’re still following that rule with Unity Catalog, you’re working harder than you need to. The game changed. Here’s what’s different: Location: Managed tables no longer live in some hidden Databricks bucket. They live in your S3/ADLS/GCS. You own the storage. Databricks just handles the housekeeping. Safety: The “accidental drop” nightmare? Unity Catalog’s UNDROP gives you a 7-day safety net. DROP TABLE gold.customers; -- Oops UNDROP TABLE gold.customers; -- No panic (within 7 days) It’s time to stop treating Managed tables like a junior feature. They’re production-ready, cleaner, and honestly? A lot less headache. When External tables still matter: → Multi-platform access — Synapse, Snowflake, or external Spark needs the data → Existing data locations — terabytes already in a specific path you can’t move → Strict compliance — auditors need exact control over storage paths → No Unity Catalog yet — legacy Hive metastore environments When Managed tables are the right call: → Databricks is your primary compute → Standard medallion architecture → Simpler lifecycle management → You trust UC governance (you should) The updated mental model: Managed = dev, External = prod Managed = Databricks-centric. External = multi-platform requirements. Databricks now recommends Managed as the default with Unity Catalog. The docs changed. The best practices changed. Time to update our assumptions. Still defaulting to External out of habit? #Databricks #DeltaLake #UnityCatalog #DataEngineering #Lakehouse

  • graphical user interface

You could have managed tables in HMS outside of DBFS as well - just create a schema pointing to external location

My main argument for managed tables is the ability to enable predictive optimization - it saves a lot of delta table management 🙏 Also, for quite a lot of clients the ability to access physical storage is limited for security reasons - meaning that deleting an actual table becomes a bit cumbersome

I still think external tables make sense better then manage tables What you say on this ?

Thanks Ezz-Eldin S. for sharing ! In Databricks (especially with Unity Catalog), managed tables abstract away physical storage completely. While users define a clear logical table name at the metadata level, the actual data is stored under a GUID-based folder structure in the managed ADLS location. Users have no control over the physical path or folder naming. This design is intentional. Managed tables hide physical storage behind GUIDs on purpose. I’m waiting for a feature where even external tables can be managed with meaningful, custom physical names, while still benefiting from Unity Catalog governance.

Like
Reply
See more comments

To view or add a comment, sign in

Explore content categories