-
Notifications
You must be signed in to change notification settings - Fork 6k
planner: maintain a map of column indices by name in DataSource #64204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Hi @henrybw. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #64204 +/- ##
================================================
+ Coverage 72.7334% 73.5241% +0.7907%
================================================
Files 1859 1860 +1
Lines 503870 505223 +1353
================================================
+ Hits 366482 371461 +4979
+ Misses 115127 111646 -3481
+ Partials 22261 22116 -145
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a ColIdxsByName map to track column name to index mappings across various planner data structures, maintaining consistency between column slices and their name-based lookups. The changes also add an AppendColumn helper method to ensure these mappings stay synchronized when columns are added.
Key changes:
- Added
ColIdxsByNamemap field toDataSource,LogicalIndexScan, andPhysicalTableScanstructs - Introduced
AppendColumnhelper method to maintain synchronization between columns and name mappings - Added
GetColIdxsByNameutility function to build name-to-index maps - Implemented map updates during column pruning operations
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| pkg/meta/model/column.go | Adds GetColIdxsByName utility function to build column name-to-index maps |
| pkg/meta/model/column_test.go | Tests the new GetColIdxsByName function |
| pkg/planner/core/operator/logicalop/logical_datasource.go | Adds ColIdxsByName field, implements AppendColumn helper, and updates column pruning logic to maintain map consistency |
| pkg/planner/core/operator/logicalop/logical_index_scan.go | Adds ColIdxsByName field to LogicalIndexScan struct |
| pkg/planner/core/operator/physicalop/physical_table_scan.go | Adds ColIdxsByName field and ensures it's cloned/copied when creating PhysicalTableScan instances |
| pkg/planner/core/logical_plan_builder.go | Updates DataSource construction to use AppendColumn helper and initialize ColIdxsByName map |
| pkg/planner/core/planbuilder.go | Initializes ColIdxsByName when building PhysicalIndexLookUpReader |
| pkg/planner/core/find_best_task.go | Populates ColIdxsByName when converting to index scan |
| pkg/planner/core/exhaust_physical_plans.go | Passes through ColIdxsByName when constructing table scan tasks |
| pkg/planner/core/planbuilder_test.go | Adds test to verify AppendColumn maintains consistency between columns and name mappings |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meta part lgtm
i remember search from map is slower compared with iterating slice, when the number of items is small, as hashing itself have overhead
What problem does this PR solve?
Issue Number: ref #63856
Problem Summary: Profiling the time spent in the optimizer repeatedly executing, for 2 minutes, a query on a table like this, but with many more indexes (500) and columns (6,500), reveals that 34% of the time now is spent in
expression.IndexCol2Col(), due to this function being passed a slice of columns that the function has to search linearly with a for loop. Instead of scanning this slice, we should be using a map to reduce lookups of column by ID from O(N) to O(1).What changed and how does it work?
Similar to #64053, this is a preliminary PR that establishes the foundation for replacing the O(N) loop with an O(1) map lookup. The goal is to eventually add a
colIdxsByName map[string]intparameter toexpression.IndexCol2Col()that can be supplied by the logical data source.model.GetColIdxMapping()function that builds this map from a slice ofmodel.ColumnInfostructs, along with aTestColIdxMapping()test case for this new function.model: (1) tests in theutil/rangerpackage depend onexpression.IndexInfo2PrefixCols()and will need to build this map too, and (2) it is next to the definition of themodel.ColumnInfothat it operates on.ColIdxsByNamemap to the logical data source in the planner.ds.Columnsandds.Schema()directly to instead call a newAppendColumn()function that also ensures the column is tracked by theds.ColIdxsByNamemap.PruneColumns(), which has been updated to also remove the pruned column fromds.ColIdxsByNameand update the affected column indices in the map.TestDataSourceColumnMapstest case that ensures thatAppendColumn()maintains coherency betweends.Columnsandds.ColIdxsByName, and also ensures thatAppendTableCol()maintains coherency betweends.TblColsandds.TblColsByID.Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.