Skip to content

planner: large estimation error of probe-side of IndexJoin when table filter selectivity is extremely low #64217

@qw4990

Description

@qw4990

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

Use the script below to reproduce this issue:

create table tbuild (a int);
insert into tbuild values (1);

create table tprobe (a int, b int, key(a), key(b));

set @@cte_max_recursion_depth=10000;
insert into tprobe select * from (
  with recursive x as (
    select 1 as a, 1 as b
    union all
    select a+1 as a, 1 as b from x where a<10000
  )
  select * from x
)tt;
-- restart TiDB to trigger stats delta update

analyze table tbuild;
analyze table tprobe;

explain analyze select /*+ tidb_inlj(tprobe) */ * from tbuild, tprobe where tbuild.a=tprobe.a and tprobe.b=2;
+------------------------------+---------+---------+-----------+--------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+-----------+------+
| id                           | estRows | actRows | task      | access object            | execution info                                                                                                                                                                                                                                                           | operator info                                                                                                                   | memory    | disk |
+------------------------------+---------+---------+-----------+--------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+-----------+------+
| IndexJoin_13                 | 1.00    | 0       | root      |                          | time:281.5µs, loops:1, RU:1.426617, inner:{total:156.7µs, concurrency:5, task:1, construct:1.54µs, fetch:150.5µs, build:250ns}, probe:1.04µs                                                                                                                             | inner join, inner:IndexLookUp_12, outer key:test.tbuild.a, inner key:test.tprobe.a, equal cond:eq(test.tbuild.a, test.tprobe.a) | 8.58 KB   | N/A  |
| ├─TableReader_26(Build)      | 1.00    | 1       | root      |                          | time:23.2µs, loops:3, cop_task: {num: 1, max: 42.2µs, proc_keys: 0, copr_cache_hit_ratio: 0.00, build_task_duration: 2.58µs, max_distsql_concurrency: 1}, rpc_info:{Cop:{num_rpc:1, total_time:38.1µs}}                                                                  | data:Selection_25                                                                                                               | 204 Bytes | N/A  |
| │ └─Selection_25             | 1.00    | 1       | cop[tikv] |                          | tikv_task:{time:33.6µs, loops:0}                                                                                                                                                                                                                                         | not(isnull(test.tbuild.a))                                                                                                      | N/A       | N/A  |
| │   └─TableFullScan_24       | 1.00    | 1       | cop[tikv] | table:tbuild             | tikv_task:{time:33.6µs, loops:0}                                                                                                                                                                                                                                         | keep order:false, stats:pseudo                                                                                                  | N/A       | N/A  |
| └─IndexLookUp_12(Probe)      | 1.00    | 0       | root      |                          | time:126.8µs, loops:1, index_task: {total_time: 39.9µs, fetch_handle: 39.5µs, build: 125ns, wait: 250ns}, table_task: {total_time: 62.6µs, num: 1, concurrency: 5}, next: {wait_index: 84µs, wait_table_lookup_build: 0s, wait_table_lookup_resp: 39.9µs}                |                                                                                                                                 | 257.0 KB  | N/A  |
|   ├─Selection_10(Build)      | 9990.00 | 1       | cop[tikv] |                          | time:33.7µs, loops:3, cop_task: {num: 1, max: 22.4µs, proc_keys: 0, copr_cache_hit_ratio: 0.00, build_task_duration: 4.13µs, max_distsql_concurrency: 1}, rpc_info:{Cop:{num_rpc:1, total_time:19.9µs}}, tikv_task:{time:17µs, loops:0}                                  | not(isnull(test.tprobe.a))                                                                                                      | N/A       | N/A  |
|   │ └─IndexRangeScan_8       | 9990.00 | 1       | cop[tikv] | table:tprobe, index:a(a) | tikv_task:{time:17µs, loops:0}                                                                                                                                                                                                                                           | range: decided by [eq(test.tprobe.a, test.tbuild.a)], keep order:false                                                          | N/A       | N/A  |
|   └─Selection_11(Probe)      | 1.00    | 0       | cop[tikv] |                          | time:39.9µs, loops:1, cop_task: {num: 1, max: 23.5µs, proc_keys: 0, copr_cache_hit_ratio: 0.00, build_task_duration: 13.5µs, max_distsql_concurrency: 1, max_extra_concurrency: 1}, rpc_info:{Cop:{num_rpc:1, total_time:21.8µs}}, tikv_task:{time:17.9µs, loops:0}      | eq(test.tprobe.b, 2)                                                                                                            | N/A       | N/A  |
|     └─TableRowIDScan_9       | 9990.00 | 1       | cop[tikv] | table:tprobe             | tikv_task:{time:17.9µs, loops:0}                                                                                                                                                                                                                                         | keep order:false                                                                                                                | N/A       | N/A  |
+------------------------------+---------+---------+-----------+--------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+-----------+------+

The estimation error of IndexRangeScan_8 is huge, 9990 vs. 1.

The root cause is that when estimating for probe-side of IndexJoin, we consider filters tprobe.b=2 in this case and index join access conditions tprobe.a=tbuild.a SEPARATELY, which means we can't capture there correlation.

In this case, the finally selectivity used for IndexRangeScan_8 is like Selectivity(tprobe.a=?) * Selectiviy(tprobe.b=2), since tprobe.a=? is an index join access condition and tprobe.b=2 is a normal filter, we calculate their selectivity seperately and combine them together with independence assumption.
But the reality is that these 2 columns have hige correlation, for example, after applying tprobe.a=?, there might no row matching tprobe.b=2.

Below is the code where we calculate Selectivity(tprobe.b=2).
Selectivity(tprobe.a=?) is calculated in another place and its result can be reflected in rowCount.
Then we just use rowCount / Selectivity(tprobe.b=2) as the finally result, which ignores their correlation:

Image

2. What did you expect to see? (Required)

Relatively good estimation result

3. What did you see instead (Required)

Large estimation error

4. What is your TiDB version? (Required)

Master

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions