planner: maintain a map of column indices by name in DataSource #64204

henrybw · 2025-10-31T02:00:16Z

What problem does this PR solve?

Issue Number: ref #63856

Problem Summary: Profiling the time spent in the optimizer repeatedly executing, for 2 minutes, a query on a table like this, but with many more indexes (500) and columns (6,500), reveals that 34% of the time now is spent in expression.IndexCol2Col(), due to this function being passed a slice of columns that the function has to search linearly with a for loop. Instead of scanning this slice, we should be using a map to reduce lookups of column by ID from O(N) to O(1).

What changed and how does it work?

Similar to #64053, this is a preliminary PR that establishes the foundation for replacing the O(N) loop with an O(1) map lookup. The goal is to eventually add a colIdxsByName map[string]int parameter to expression.IndexCol2Col() that can be supplied by the logical data source.

Add a new model.GetColIdxMapping() function that builds this map from a slice of model.ColumnInfo structs, along with a TestColIdxMapping() test case for this new function.
- There are two reasons why this function is in model: (1) tests in the util/ranger package depend on expression.IndexInfo2PrefixCols() and will need to build this map too, and (2) it is next to the definition of the model.ColumnInfo that it operates on.
Add a ColIdxsByName map to the logical data source in the planner.
Refactor all places in the planner that append columns to ds.Columns and ds.Schema() directly to instead call a new AppendColumn() function that also ensures the column is tracked by the ds.ColIdxsByName map.
- The only exception is PruneColumns(), which has been updated to also remove the pruned column from ds.ColIdxsByName and update the affected column indices in the map.
Add a new TestDataSourceColumnMaps test case that ensures that AppendColumn() maintains coherency between ds.Columns and ds.ColIdxsByName, and also ensures that AppendTableCol() maintains coherency between ds.TblCols and ds.TblColsByID.

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No need to test
- I checked and no code files have been changed.

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

ti-chi-bot · 2025-10-31T02:00:27Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hawkingrei, wjhuang2016 for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

pkg/meta/OWNERS
pkg/planner/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

tiprow · 2025-10-31T02:00:35Z

Hi @henrybw. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

codecov · 2025-10-31T02:21:30Z

Codecov Report

❌ Patch coverage is 96.87500% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 73.5241%. Comparing base (bdd2b6f) to head (eb145bd).
⚠️ Report is 6 commits behind head on master.

Additional details and impacted files

@@               Coverage Diff                @@
##             master     #64204        +/-   ##
================================================
+ Coverage   72.7334%   73.5241%   +0.7907%     
================================================
  Files          1859       1860         +1     
  Lines        503870     505223      +1353     
================================================
+ Hits         366482     371461      +4979     
+ Misses       115127     111646      -3481     
+ Partials      22261      22116       -145

Flag	Coverage Δ
integration	`41.8379% <90.6250%> (?)`
unit	`72.6008% <96.8750%> (+0.3017%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
dumpling	`52.8700% <ø> (ø)`
parser	`∅ <ø> (∅)`
br	`46.4181% <ø> (+0.0328%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

hawkingrei · 2025-10-31T03:22:18Z

/retest

Copilot

Pull Request Overview

This PR introduces a ColIdxsByName map to track column name to index mappings across various planner data structures, maintaining consistency between column slices and their name-based lookups. The changes also add an AppendColumn helper method to ensure these mappings stay synchronized when columns are added.

Key changes:

Added ColIdxsByName map field to DataSource, LogicalIndexScan, and PhysicalTableScan structs
Introduced AppendColumn helper method to maintain synchronization between columns and name mappings
Added GetColIdxsByName utility function to build name-to-index maps
Implemented map updates during column pruning operations

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
pkg/meta/model/column.go	Adds `GetColIdxsByName` utility function to build column name-to-index maps
pkg/meta/model/column_test.go	Tests the new `GetColIdxsByName` function
pkg/planner/core/operator/logicalop/logical_datasource.go	Adds `ColIdxsByName` field, implements `AppendColumn` helper, and updates column pruning logic to maintain map consistency
pkg/planner/core/operator/logicalop/logical_index_scan.go	Adds `ColIdxsByName` field to LogicalIndexScan struct
pkg/planner/core/operator/physicalop/physical_table_scan.go	Adds `ColIdxsByName` field and ensures it's cloned/copied when creating PhysicalTableScan instances
pkg/planner/core/logical_plan_builder.go	Updates DataSource construction to use `AppendColumn` helper and initialize `ColIdxsByName` map
pkg/planner/core/planbuilder.go	Initializes `ColIdxsByName` when building PhysicalIndexLookUpReader
pkg/planner/core/find_best_task.go	Populates `ColIdxsByName` when converting to index scan
pkg/planner/core/exhaust_physical_plans.go	Passes through `ColIdxsByName` when constructing table scan tasks
pkg/planner/core/planbuilder_test.go	Adds test to verify `AppendColumn` maintains consistency between columns and name mappings

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pkg/planner/core/operator/logicalop/logical_datasource.go

hawkingrei · 2025-10-31T03:30:40Z

/retest

D3Hunter

meta part lgtm

i remember search from map is slower compared with iterating slice, when the number of items is small, as hashing itself have overhead

pkg/meta/model/column.go

planner: maintain a map of column indices by name in DataSource

5a57bc0

ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 31, 2025

hawkingrei requested a review from Copilot October 31, 2025 03:19

Copilot AI reviewed Oct 31, 2025

View reviewed changes

pkg/planner/core/operator/logicalop/logical_datasource.go Outdated Show resolved Hide resolved

D3Hunter reviewed Oct 31, 2025

View reviewed changes

pkg/meta/model/column.go Outdated Show resolved Hide resolved

henrybw added 3 commits October 31, 2025 12:23

don't shadow the loop induction variable

1078e5e

GetColIdxsByName -> GetColIdxMapping

21d6381

TestColIdxsByName -> TestColIdxMapping

eb145bd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

planner: maintain a map of column indices by name in DataSource #64204

planner: maintain a map of column indices by name in DataSource #64204

henrybw commented Oct 31, 2025 •

edited

Loading

ti-chi-bot bot commented Oct 31, 2025

tiprow bot commented Oct 31, 2025

codecov bot commented Oct 31, 2025 •

edited

Loading

hawkingrei commented Oct 31, 2025

Copilot AI left a comment

Uh oh!

hawkingrei commented Oct 31, 2025

D3Hunter left a comment

Uh oh!

Labels

3 participants

planner: maintain a map of column indices by name in DataSource #64204

Are you sure you want to change the base?

planner: maintain a map of column indices by name in DataSource #64204

Conversation

henrybw commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What changed and how does it work?

Check List

Release note

ti-chi-bot bot commented Oct 31, 2025

tiprow bot commented Oct 31, 2025

codecov bot commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

hawkingrei commented Oct 31, 2025

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

hawkingrei commented Oct 31, 2025

D3Hunter left a comment

Choose a reason for hiding this comment

Uh oh!

Labels

3 participants

henrybw commented Oct 31, 2025 •

edited

Loading

codecov bot commented Oct 31, 2025 •

edited

Loading