Skip to content

Conversation

@henrybw
Copy link
Contributor

@henrybw henrybw commented Oct 31, 2025

What problem does this PR solve?

Issue Number: ref #63856

Problem Summary: Profiling the time spent in the optimizer repeatedly executing, for 2 minutes, a query on a table like this, but with many more indexes (500) and columns (6,500), reveals that 34% of the time now is spent in expression.IndexCol2Col(), due to this function being passed a slice of columns that the function has to search linearly with a for loop. Instead of scanning this slice, we should be using a map to reduce lookups of column by ID from O(N) to O(1).

IndexCol2Col_pprof

What changed and how does it work?

Similar to #64053, this is a preliminary PR that establishes the foundation for replacing the O(N) loop with an O(1) map lookup. The goal is to eventually add a colIdxsByName map[string]int parameter to expression.IndexCol2Col() that can be supplied by the logical data source.

  • Add a new model.GetColIdxMapping() function that builds this map from a slice of model.ColumnInfo structs, along with a TestColIdxMapping() test case for this new function.
    • There are two reasons why this function is in model: (1) tests in the util/ranger package depend on expression.IndexInfo2PrefixCols() and will need to build this map too, and (2) it is next to the definition of the model.ColumnInfo that it operates on.
  • Add a ColIdxsByName map to the logical data source in the planner.
  • Refactor all places in the planner that append columns to ds.Columns and ds.Schema() directly to instead call a new AppendColumn() function that also ensures the column is tracked by the ds.ColIdxsByName map.
    • The only exception is PruneColumns(), which has been updated to also remove the pruned column from ds.ColIdxsByName and update the affected column indices in the map.
  • Add a new TestDataSourceColumnMaps test case that ensures that AppendColumn() maintains coherency between ds.Columns and ds.ColIdxsByName, and also ensures that AppendTableCol() maintains coherency between ds.TblCols and ds.TblColsByID.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None
@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 31, 2025
@ti-chi-bot
Copy link

ti-chi-bot bot commented Oct 31, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hawkingrei, wjhuang2016 for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tiprow
Copy link

tiprow bot commented Oct 31, 2025

Hi @henrybw. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@codecov
Copy link

codecov bot commented Oct 31, 2025

Codecov Report

❌ Patch coverage is 96.87500% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 73.5241%. Comparing base (bdd2b6f) to head (eb145bd).
⚠️ Report is 6 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #64204        +/-   ##
================================================
+ Coverage   72.7334%   73.5241%   +0.7907%     
================================================
  Files          1859       1860         +1     
  Lines        503870     505223      +1353     
================================================
+ Hits         366482     371461      +4979     
+ Misses       115127     111646      -3481     
+ Partials      22261      22116       -145     
Flag Coverage Δ
integration 41.8379% <90.6250%> (?)
unit 72.6008% <96.8750%> (+0.3017%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 52.8700% <ø> (ø)
parser ∅ <ø> (∅)
br 46.4181% <ø> (+0.0328%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
@hawkingrei hawkingrei requested a review from Copilot October 31, 2025 03:19
@hawkingrei
Copy link
Member

/retest

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a ColIdxsByName map to track column name to index mappings across various planner data structures, maintaining consistency between column slices and their name-based lookups. The changes also add an AppendColumn helper method to ensure these mappings stay synchronized when columns are added.

Key changes:

  • Added ColIdxsByName map field to DataSource, LogicalIndexScan, and PhysicalTableScan structs
  • Introduced AppendColumn helper method to maintain synchronization between columns and name mappings
  • Added GetColIdxsByName utility function to build name-to-index maps
  • Implemented map updates during column pruning operations

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
pkg/meta/model/column.go Adds GetColIdxsByName utility function to build column name-to-index maps
pkg/meta/model/column_test.go Tests the new GetColIdxsByName function
pkg/planner/core/operator/logicalop/logical_datasource.go Adds ColIdxsByName field, implements AppendColumn helper, and updates column pruning logic to maintain map consistency
pkg/planner/core/operator/logicalop/logical_index_scan.go Adds ColIdxsByName field to LogicalIndexScan struct
pkg/planner/core/operator/physicalop/physical_table_scan.go Adds ColIdxsByName field and ensures it's cloned/copied when creating PhysicalTableScan instances
pkg/planner/core/logical_plan_builder.go Updates DataSource construction to use AppendColumn helper and initialize ColIdxsByName map
pkg/planner/core/planbuilder.go Initializes ColIdxsByName when building PhysicalIndexLookUpReader
pkg/planner/core/find_best_task.go Populates ColIdxsByName when converting to index scan
pkg/planner/core/exhaust_physical_plans.go Passes through ColIdxsByName when constructing table scan tasks
pkg/planner/core/planbuilder_test.go Adds test to verify AppendColumn maintains consistency between columns and name mappings

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@hawkingrei
Copy link
Member

/retest

Copy link
Contributor

@D3Hunter D3Hunter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meta part lgtm

i remember search from map is slower compared with iterating slice, when the number of items is small, as hashing itself have overhead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

3 participants