Closed
Description
Is your feature request related to a problem or challenge?
When trying to understand what a plan does, DataFusion EXPLAIN plans have all the detail, but are sometimes hard to read
For example if we run EXPLAIN
on ClickBench query 28,
EXPLAIN SELECT REGEXP_REPLACE("Referer", '^https?://(?:www\\.)?([^/]+)/.*$', '\\1') AS k, AVG(length("Referer")) AS l, COUNT(*) AS c, MIN("Referer") FROM "hits.parquet" WHERE "Referer" <> '' GROUP BY k HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25;
Results in this explain plan
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan | Limit: skip=0, fetch=25 |
| | Sort: l DESC NULLS FIRST, fetch=25 |
| | Projection: regexp_replace(hits.parquet.Referer,Utf8("^https?://(?:www\.)?([^/]+)/.*$"),Utf8("\1")) AS k, AVG(character_length(hits.parquet.Referer)) AS l, COUNT(*) AS c, MIN(hits.parquet.Referer) |
| | Filter: COUNT(*) > Int64(100000) |
| | Aggregate: groupBy=[[regexp_replace(hits.parquet.Referer, Utf8("^https?://(?:www\.)?([^/]+)/.*$"), Utf8("\1"))]], aggr=[[AVG(CAST(character_length(hits.parquet.Referer) AS Float64)), COUNT(UInt8(1)) AS COUNT(*), MIN(hits.parquet.Referer)]] |
| | Filter: hits.parquet.Referer != Utf8("") |
| | TableScan: hits.parquet projection=[Referer], partial_filters=[hits.parquet.Referer != Utf8("")] |
| physical_plan | GlobalLimitExec: skip=0, fetch=25 |
| | SortPreservingMergeExec: [l@1 DESC], fetch=25 |
| | SortExec: TopK(fetch=25), expr=[l@1 DESC] |
| | ProjectionExec: expr=[regexp_replace(hits.parquet.Referer,Utf8("^https?://(?:www\.)?([^/]+)/.*$"),Utf8("\1"))@0 as k, AVG(character_length(hits.parquet.Referer))@1 as l, COUNT(*)@2 as c, MIN(hits.parquet.Referer)@3 as MIN(hits.parquet.Referer)] |
| | CoalesceBatchesExec: target_batch_size=8192 |
| | FilterExec: COUNT(*)@2 > 100000 |
| | AggregateExec: mode=FinalPartitioned, gby=[regexp_replace(hits.parquet.Referer,Utf8("^https?://(?:www\.)?([^/]+)/.*$"),Utf8("\1"))@0 as regexp_replace(hits.parquet.Referer,Utf8("^https?://(?:www\.)?([^/]+)/.*$"),Utf8("\1"))], aggr=[AVG(character_length(hits.parquet.Referer)), COUNT(*), MIN(hits.parquet.Referer)] |
| | CoalesceBatchesExec: target_batch_size=8192 |
| | RepartitionExec: partitioning=Hash([regexp_replace(hits.parquet.Referer,Utf8("^https?://(?:www\.)?([^/]+)/.*$"),Utf8("\1"))@0], 16), input_partitions=16 |
| | AggregateExec: mode=Partial, gby=[regexp_replace(Referer@0, ^https?://(?:www\.)?([^/]+)/.*$, \1) as regexp_replace(hits.parquet.Referer,Utf8("^https?://(?:www\.)?([^/]+)/.*$"),Utf8("\1"))], aggr=[AVG(character_length(hits.parquet.Referer)), COUNT(*), MIN(hits.parquet.Referer)] |
| | CoalesceBatchesExec: target_batch_size=8192 |
| | FilterExec: Referer@0 != |
| | ParquetExec: file_groups={16 groups: [[Users/andrewlamb/Downloads/hits.parquet:0..923748528], [Users/andrewlamb/Downloads/hits.parquet:923748528..1847497056], [Users/andrewlamb/Downloads/hits.parquet:1847497056..2771245584], [Users/andrewlamb/Downloads/hits.parquet:2771245584..3694994112], [Users/andrewlamb/Downloads/hits.parquet:3694994112..4618742640], ...]}, projection=[Referer], predicate=Referer@14 != , pruning_predicate=Referer_min@0 != OR != Referer_max@1, required_guarantees=[Referer not in ()] |
| | |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set. Query took 0.045 seconds.
This has all the details but it takes non trivial expertise to understand what it is doing. The challenge is even more pronounced for queries with Unions or Joins where some nodes have multiple children
Describe the solution you'd like
I would like DuckDB style explain plans (probably as an option / in addition to the current ones)
andrewlamb@Andrews-MacBook-Pro:~/Downloads$ duckdb
v0.9.2 3c695d7ba9
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D EXPLAIN SELECT REGEXP_REPLACE("Referer", '^https?://(?:www\\.)?([^/]+)/.*$', '\\1') AS k, AVG(length("Referer")) AS l, COUNT(*) AS c, MIN("Referer") FROM hits.parquet" WHERE "Referer" <> '' GROUP BY k HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25;
┌─────────────────────────────┐
│┌───────────────────────────┐│
││ Physical Plan ││
│└───────────────────────────┘│
└─────────────────────────────┘
┌───────────────────────────┐
│ TOP_N │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ Top 25 │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ avg(length(hits.Referer)) │
│ DESC │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ 0 │
│ l │
│ c │
│ min(Referer) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ FILTER │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ (count_star() > 100000) │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ EC: 99997497 │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ HASH_GROUP_BY │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ #0 │
│ count_star() │
│ avg(#1) │
│ min(#2) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ k │
│ length(Referer) │
│ Referer │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ FILTER │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ (Referer != '') │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ EC: 99997497 │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PARQUET_SCAN │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ Referer │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ EC: 99997497 │
└───────────────────────────┘
Describe alternatives you've considered
There are several functions already for creating graphviz plans such as this, but I think they get limited use because they can't be displayed directly (they need to be rendered into visual format)
Additional context
No response