Skip to content

Implement Nicer / DuckDB style explain plans #9371

Closed
@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

When trying to understand what a plan does, DataFusion EXPLAIN plans have all the detail, but are sometimes hard to read

For example if we run EXPLAIN on ClickBench query 28,

EXPLAIN SELECT REGEXP_REPLACE("Referer", '^https?://(?:www\\.)?([^/]+)/.*$', '\\1') AS k, AVG(length("Referer")) AS l, COUNT(*) AS c, MIN("Referer") FROM "hits.parquet" WHERE "Referer" <> '' GROUP BY k HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25;

Results in this explain plan

+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan  | Limit: skip=0, fetch=25                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
|               |   Sort: l DESC NULLS FIRST, fetch=25                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
|               |     Projection: regexp_replace(hits.parquet.Referer,Utf8("^https?://(?:www\.)?([^/]+)/.*$"),Utf8("\1")) AS k, AVG(character_length(hits.parquet.Referer)) AS l, COUNT(*) AS c, MIN(hits.parquet.Referer)                                                                                                                                                                                                                                                                                                                                              |
|               |       Filter: COUNT(*) > Int64(100000)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
|               |         Aggregate: groupBy=[[regexp_replace(hits.parquet.Referer, Utf8("^https?://(?:www\.)?([^/]+)/.*$"), Utf8("\1"))]], aggr=[[AVG(CAST(character_length(hits.parquet.Referer) AS Float64)), COUNT(UInt8(1)) AS COUNT(*), MIN(hits.parquet.Referer)]]                                                                                                                                                                                                                                                                                               |
|               |           Filter: hits.parquet.Referer != Utf8("")                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
|               |             TableScan: hits.parquet projection=[Referer], partial_filters=[hits.parquet.Referer != Utf8("")]                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| physical_plan | GlobalLimitExec: skip=0, fetch=25                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
|               |   SortPreservingMergeExec: [l@1 DESC], fetch=25                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
|               |     SortExec: TopK(fetch=25), expr=[l@1 DESC]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
|               |       ProjectionExec: expr=[regexp_replace(hits.parquet.Referer,Utf8("^https?://(?:www\.)?([^/]+)/.*$"),Utf8("\1"))@0 as k, AVG(character_length(hits.parquet.Referer))@1 as l, COUNT(*)@2 as c, MIN(hits.parquet.Referer)@3 as MIN(hits.parquet.Referer)]                                                                                                                                                                                                                                                                                            |
|               |         CoalesceBatchesExec: target_batch_size=8192                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
|               |           FilterExec: COUNT(*)@2 > 100000                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
|               |             AggregateExec: mode=FinalPartitioned, gby=[regexp_replace(hits.parquet.Referer,Utf8("^https?://(?:www\.)?([^/]+)/.*$"),Utf8("\1"))@0 as regexp_replace(hits.parquet.Referer,Utf8("^https?://(?:www\.)?([^/]+)/.*$"),Utf8("\1"))], aggr=[AVG(character_length(hits.parquet.Referer)), COUNT(*), MIN(hits.parquet.Referer)]                                                                                                                                                                                                                 |
|               |               CoalesceBatchesExec: target_batch_size=8192                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
|               |                 RepartitionExec: partitioning=Hash([regexp_replace(hits.parquet.Referer,Utf8("^https?://(?:www\.)?([^/]+)/.*$"),Utf8("\1"))@0], 16), input_partitions=16                                                                                                                                                                                                                                                                                                                                                                              |
|               |                   AggregateExec: mode=Partial, gby=[regexp_replace(Referer@0, ^https?://(?:www\.)?([^/]+)/.*$, \1) as regexp_replace(hits.parquet.Referer,Utf8("^https?://(?:www\.)?([^/]+)/.*$"),Utf8("\1"))], aggr=[AVG(character_length(hits.parquet.Referer)), COUNT(*), MIN(hits.parquet.Referer)]                                                                                                                                                                                                                                               |
|               |                     CoalesceBatchesExec: target_batch_size=8192                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
|               |                       FilterExec: Referer@0 !=                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|               |                         ParquetExec: file_groups={16 groups: [[Users/andrewlamb/Downloads/hits.parquet:0..923748528], [Users/andrewlamb/Downloads/hits.parquet:923748528..1847497056], [Users/andrewlamb/Downloads/hits.parquet:1847497056..2771245584], [Users/andrewlamb/Downloads/hits.parquet:2771245584..3694994112], [Users/andrewlamb/Downloads/hits.parquet:3694994112..4618742640], ...]}, projection=[Referer], predicate=Referer@14 != , pruning_predicate=Referer_min@0 !=  OR  != Referer_max@1, required_guarantees=[Referer not in ()] |
|               |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set. Query took 0.045 seconds.

This has all the details but it takes non trivial expertise to understand what it is doing. The challenge is even more pronounced for queries with Unions or Joins where some nodes have multiple children

Describe the solution you'd like

I would like DuckDB style explain plans (probably as an option / in addition to the current ones)

andrewlamb@Andrews-MacBook-Pro:~/Downloads$ duckdb
v0.9.2 3c695d7ba9
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D EXPLAIN SELECT REGEXP_REPLACE("Referer", '^https?://(?:www\\.)?([^/]+)/.*$', '\\1') AS k, AVG(length("Referer")) AS l, COUNT(*) AS c, MIN("Referer") FROM hits.parquet" WHERE "Referer" <> '' GROUP BY k HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25;

┌─────────────────────────────┐
│┌───────────────────────────┐│
││       Physical Plan       ││
│└───────────────────────────┘│
└─────────────────────────────┘
┌───────────────────────────┐
│           TOP_N           │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           Top 25          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│ avg(length(hits.Referer)) │
│            DESC           │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             0             │
│             l             │
│             c             │
│        min(Referer)       │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│           FILTER          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│  (count_star() > 100000)  │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│        EC: 99997497       │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│       HASH_GROUP_BY       │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             #0            │
│        count_star()       │
│          avg(#1)          │
│          min(#2)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             k             │
│      length(Referer)      │
│          Referer          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│           FILTER          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│      (Referer != '')      │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│        EC: 99997497       │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│       PARQUET_SCAN        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│          Referer          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│        EC: 99997497       │
└───────────────────────────┘

Describe alternatives you've considered

There are several functions already for creating graphviz plans such as this, but I think they get limited use because they can't be displayed directly (they need to be rendered into visual format)

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions