-
Notifications
You must be signed in to change notification settings - Fork 5.5k
feat: TVF Part 6.5/X Small TVF analyzer/planning changes #26505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Reviewer's GuideThis PR refactors the table function planning and analysis flow by introducing copartitioning support, enforcing immutability in TableFunctionNode, enriching pass-through column metadata, and aligning tests and planner/parser components with these enhancements. Class diagram for updated TableFunctionNode and related classesclassDiagram
class TableFunctionNode {
- Map<String, Argument> arguments
- List<VariableReferenceExpression> outputVariables
- List<PlanNode> sources
- List<TableArgumentProperties> tableArgumentProperties
- List<List<String>> copartitioningLists
- TableFunctionHandle handle
+ getArguments()
+ getOutputVariables()
+ getProperOutputs()
+ getTableArgumentProperties()
+ getCopartitioningLists()
+ getHandle()
}
class TableArgumentProperties {
- String argumentName
- boolean rowSemantics
- boolean pruneWhenEmpty
- PassThroughSpecification passThroughSpecification
- List<VariableReferenceExpression> requiredColumns
- Optional<DataOrganizationSpecification> specification
+ getArgumentName()
+ isRowSemantics()
+ isPruneWhenEmpty()
+ getPassThroughSpecification()
+ getRequiredColumns()
+ getSpecification()
}
class PassThroughSpecification {
- boolean declaredAsPassThrough
- List<PassThroughColumn> columns
+ isDeclaredAsPassThrough()
+ getColumns()
}
class PassThroughColumn {
- VariableReferenceExpression outputVariables
- boolean isPartitioningColumn
+ getOutputVariables()
+ isPartitioningColumn()
}
TableFunctionNode "1" *-- "*" TableArgumentProperties
TableArgumentProperties "1" *-- "1" PassThroughSpecification
PassThroughSpecification "1" *-- "*" PassThroughColumn
PassThroughColumn "1" *-- "1" VariableReferenceExpression
Class diagram for QueryPlanner changesclassDiagram
class QueryPlanner {
+ coerce(PlanBuilder, List<Expression>, Analysis, PlanNodeIdAllocator, VariableAllocator, Metadata)
+ static toSymbolReferences(List<VariableReferenceExpression>)
+ static toSymbolReference(VariableReferenceExpression)
}
class PlanAndMappings {
- PlanBuilder subPlan
- Map<Expression, VariableReferenceExpression> mappings
}
QueryPlanner "1" *-- "1" PlanAndMappings
Class diagram for SimplePlanRewriter context changesclassDiagram
class SimplePlanRewriter {
+ getNodeRewriter()
}
class Context {
+ getNodeRewriter()
}
Context "1" *-- "1" SimplePlanRewriter
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com> Co-authored-by: Xin Zhang <desertsxin@gmail.com>
daf3bd6 to
21652b4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- Add explicit requireNonNull checks for copartitioningLists and its nested lists in the TableFunctionNode constructor to guard against null inputs and avoid potential NPEs.
- Consider renaming PassThroughColumn.getOutputVariables to a singular form like getOutputVariable to better reflect that it returns a single variable and reduce naming confusion.
- Review the decision to make QueryPlanner.coerce, toSymbolReferences, and toSymbolReference public—if they are only needed for testing or internal use, consider annotating them with @VisibleForTesting or keeping them non-public.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Add explicit requireNonNull checks for copartitioningLists and its nested lists in the TableFunctionNode constructor to guard against null inputs and avoid potential NPEs.
- Consider renaming PassThroughColumn.getOutputVariables to a singular form like getOutputVariable to better reflect that it returns a single variable and reduce naming confusion.
- Review the decision to make QueryPlanner.coerce, toSymbolReferences, and toSymbolReference public—if they are only needed for testing or internal use, consider annotating them with @VisibleForTesting or keeping them non-public.
## Individual Comments
### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TableFunctionNode.java:78-85` </location>
<code_context>
- this.outputVariables = requireNonNull(outputVariables, "outputVariables is null");
- this.sources = requireNonNull(sources, "sources is null");
- this.tableArgumentProperties = requireNonNull(tableArgumentProperties, "tableArgumentProperties is null");
+ this.arguments = ImmutableMap.copyOf(arguments);
+ this.outputVariables = ImmutableList.copyOf(outputVariables);
+ this.sources = ImmutableList.copyOf(sources);
+ this.tableArgumentProperties = ImmutableList.copyOf(tableArgumentProperties);
+ this.copartitioningLists = copartitioningLists.stream()
+ .map(ImmutableList::copyOf)
+ .collect(toImmutableList());
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Immutable collections are used for constructor arguments, but copartitioningLists is not null-checked.
Add a requireNonNull check for copartitioningLists to prevent possible NullPointerExceptions and maintain consistency with other constructor arguments.
```suggestion
this.arguments = ImmutableMap.copyOf(arguments);
this.outputVariables = ImmutableList.copyOf(outputVariables);
this.sources = ImmutableList.copyOf(sources);
this.tableArgumentProperties = ImmutableList.copyOf(tableArgumentProperties);
this.copartitioningLists = requireNonNull(copartitioningLists, "copartitioningLists is null").stream()
.map(ImmutableList::copyOf)
.collect(toImmutableList());
this.handle = requireNonNull(handle, "handle is null");
```
</issue_to_address>
### Comment 2
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TableFunctionNode.java:116` </location>
<code_context>
+ return variables.build();
+ }
+
+ public List<VariableReferenceExpression> getProperOutputs()
{
return outputVariables;
</code_context>
<issue_to_address>
**suggestion:** Method name getProperOutputs is ambiguous.
Consider renaming to getOutputVariablesOnly or a more descriptive name, since getOutputVariables is overridden with different logic.
Suggested implementation:
```java
public List<VariableReferenceExpression> getOutputVariablesOnly()
{
return outputVariables;
}
```
If there are any calls to `getProperOutputs()` elsewhere in this file or in other files, they should be updated to `getOutputVariablesOnly()` to match the new method name.
</issue_to_address>
### Comment 3
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TableFunctionNode.java:265` </location>
<code_context>
+
+ public static class PassThroughColumn
+ {
+ private final VariableReferenceExpression outputVariables;
+ private final boolean isPartitioningColumn;
+
</code_context>
<issue_to_address>
**nitpick:** Field name outputVariables in PassThroughColumn is misleading.
Since outputVariables is a single VariableReferenceExpression, renaming it to outputVariable would improve clarity and maintain naming consistency.
</issue_to_address>
### Comment 4
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TableFunctionNode.java:278` </location>
<code_context>
+ }
+
+ @JsonProperty
+ public VariableReferenceExpression getOutputVariables()
+ {
+ return outputVariables;
</code_context>
<issue_to_address>
**nitpick:** Method getOutputVariables in PassThroughColumn should be getOutputVariable.
Rename the getter to getOutputVariable for consistency with the field name and to distinguish it from the plural method in the parent class.
</issue_to_address>
### Comment 5
<location> `presto-main-base/src/test/java/com/facebook/presto/connector/tvf/TestingTableFunctions.java:539-77` </location>
<code_context>
+ public static class DifferentArgumentTypesFunction
</code_context>
<issue_to_address>
**suggestion (testing):** Missing negative/edge case tests for DifferentArgumentTypesFunction.
Please add tests for error conditions and edge cases, such as missing or invalid arguments, to improve validation coverage for this function.
Suggested implementation:
```java
public static class DifferentArgumentTypesFunction
extends AbstractConnectorTableFunction
{
public static final String FUNCTION_NAME = "different_arguments_function";
public DifferentArgumentTypesFunction()
{
super(
SCHEMA_NAME,
FUNCTION_NAME,
ImmutableList.of(
TableArgumentSpecification.builder()
// Edge/negative case tests for DifferentArgumentTypesFunction
@Test(expected = IllegalArgumentException.class)
public void testMissingRequiredArgument()
{
// Attempt to call the function with missing required argument
// This should throw an IllegalArgumentException or similar
DifferentArgumentTypesFunction function = new DifferentArgumentTypesFunction();
// Simulate missing argument (actual invocation may differ based on API)
function.apply(/* missing required argument */);
}
@Test(expected = ClassCastException.class)
public void testInvalidArgumentType()
{
// Attempt to call the function with an argument of the wrong type
DifferentArgumentTypesFunction function = new DifferentArgumentTypesFunction();
// Simulate invalid type (actual invocation may differ based on API)
function.apply("invalid_type_instead_of_expected");
}
@Test
public void testBoundaryValues()
{
DifferentArgumentTypesFunction function = new DifferentArgumentTypesFunction();
// Test with boundary values, e.g., empty table, nulls, etc.
try {
function.apply(/* boundary value, e.g., empty table or null */);
} catch (Exception e) {
// Assert that the function handles boundary values gracefully
assertTrue(e instanceof IllegalArgumentException || e instanceof NullPointerException);
}
}
```
You may need to adjust the `function.apply(...)` calls to match the actual method signature and argument requirements of `DifferentArgumentTypesFunction`. Also, ensure that the test annotations (`@Test`) are compatible with your test framework (e.g., JUnit).
</issue_to_address>
### Comment 6
<location> `presto-main-base/src/test/java/com/facebook/presto/sql/analyzer/TestAnalyzer.java:1971` </location>
<code_context>
- analyze("SELECT * FROM TABLE(system.two_arguments_function(text => 'foo', number => 1))");
+ analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function('foo'))");
+ analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => 'foo'))");
+ analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function('foo', 1))");
+ analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => 'foo', number => 1))");
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding tests for edge cases with null and default arguments for two_scalar_arguments_function.
Please add tests for cases with default values, single arguments, and combinations of null and default values to fully cover input scenarios for two_scalar_arguments_function.
Suggested implementation:
```java
analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function('foo'))");
analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => 'foo'))");
// Edge cases: null and default arguments
analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(NULL, 1))");
analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function('foo', NULL))");
analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => NULL, number => 1))");
analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => 'foo', number => NULL))");
// If the function supports default values, test with omitted arguments
analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function())"); // both default
analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => DEFAULT, number => DEFAULT))");
analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(DEFAULT, DEFAULT))");
// Combinations of null and default
analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(NULL, DEFAULT))");
analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(DEFAULT, NULL))");
// Single argument cases (already present, but ensure both positional and named)
analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function('foo'))");
analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => 'foo'))");
analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function('foo', 1))");
analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => 'foo', number => 1))");
// Invalid cases: too many arguments, all null, etc.
assertFails(TABLE_FUNCTION_INVALID_ARGUMENTS, "line 1:58: Too many arguments. Expected at most 2 arguments, got 3 arguments", "SELECT * FROM TABLE(system.two_scalar_arguments_function(1, 2, 3))");
assertFails(TABLE_FUNCTION_INVALID_ARGUMENTS, "line 1:58: All arguments are null", "SELECT * FROM TABLE(system.two_scalar_arguments_function(NULL, NULL))");
```
- If `DEFAULT` is not a valid keyword in your SQL dialect, replace it with the appropriate syntax for default arguments.
- If the function does not support omitted arguments or defaults, remove those tests or adjust them to expect failures.
- Ensure that the error messages in `assertFails` match the actual error messages produced by your analyzer for null/default misuse.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| return variables.build(); | ||
| } | ||
|
|
||
| public List<VariableReferenceExpression> getProperOutputs() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Method name getProperOutputs is ambiguous.
Consider renaming to getOutputVariablesOnly or a more descriptive name, since getOutputVariables is overridden with different logic.
Suggested implementation:
public List<VariableReferenceExpression> getOutputVariablesOnly()
{
return outputVariables;
}If there are any calls to getProperOutputs() elsewhere in this file or in other files, they should be updated to getOutputVariablesOnly() to match the new method name.
|
|
||
| public static class PassThroughColumn | ||
| { | ||
| private final VariableReferenceExpression outputVariables; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: Field name outputVariables in PassThroughColumn is misleading.
Since outputVariables is a single VariableReferenceExpression, renaming it to outputVariable would improve clarity and maintain naming consistency.
| } | ||
|
|
||
| @JsonProperty | ||
| public VariableReferenceExpression getOutputVariables() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: Method getOutputVariables in PassThroughColumn should be getOutputVariable.
Rename the getter to getOutputVariable for consistency with the field name and to distinguish it from the plural method in the parent class.
Description
Smaller changes from #26445 to make the PR more readable. Mainly cleanup/refactoring and some small additions.
Motivation and Context
To make the larger PR simpler. These changes can be standalone.
Impact
Test Plan
Ran updated TestAnalyzer to make sure analysis changes didn't break anything.
Ran TestTableFunctionInvocation to make sure push down table functions aren't broken.
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.