feat: TVF Part 6.5/X Small TVF analyzer/planning changes #26505

mohsaka · 2025-10-31T23:34:35Z

Description

Smaller changes from #26445 to make the PR more readable. Mainly cleanup/refactoring and some small additions.

Motivation and Context

To make the larger PR simpler. These changes can be standalone.

Impact

Test Plan

Ran updated TestAnalyzer to make sure analysis changes didn't break anything.
Ran TestTableFunctionInvocation to make sure push down table functions aren't broken.

Contributor checklist

Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
Documented new properties (with its default value), SQL syntax, functions, or other functionality.
If release notes are required, they follow the release notes guidelines.
Adequate tests were added if applicable.
CI passed.
If adding new dependencies, verified they have an OpenSSF Scorecard score of 5.0 or higher (or obtained explicit TSC approval for lower scores).

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== NO RELEASE NOTE ==

sourcery-ai · 2025-10-31T23:34:41Z

Reviewer's Guide

This PR refactors the table function planning and analysis flow by introducing copartitioning support, enforcing immutability in TableFunctionNode, enriching pass-through column metadata, and aligning tests and planner/parser components with these enhancements.

Class diagram for updated TableFunctionNode and related classes

classDiagram
class TableFunctionNode {
  - Map<String, Argument> arguments
  - List<VariableReferenceExpression> outputVariables
  - List<PlanNode> sources
  - List<TableArgumentProperties> tableArgumentProperties
  - List<List<String>> copartitioningLists
  - TableFunctionHandle handle
  + getArguments()
  + getOutputVariables()
  + getProperOutputs()
  + getTableArgumentProperties()
  + getCopartitioningLists()
  + getHandle()
}
class TableArgumentProperties {
  - String argumentName
  - boolean rowSemantics
  - boolean pruneWhenEmpty
  - PassThroughSpecification passThroughSpecification
  - List<VariableReferenceExpression> requiredColumns
  - Optional<DataOrganizationSpecification> specification
  + getArgumentName()
  + isRowSemantics()
  + isPruneWhenEmpty()
  + getPassThroughSpecification()
  + getRequiredColumns()
  + getSpecification()
}
class PassThroughSpecification {
  - boolean declaredAsPassThrough
  - List<PassThroughColumn> columns
  + isDeclaredAsPassThrough()
  + getColumns()
}
class PassThroughColumn {
  - VariableReferenceExpression outputVariables
  - boolean isPartitioningColumn
  + getOutputVariables()
  + isPartitioningColumn()
}
TableFunctionNode "1" *-- "*" TableArgumentProperties
TableArgumentProperties "1" *-- "1" PassThroughSpecification
PassThroughSpecification "1" *-- "*" PassThroughColumn
PassThroughColumn "1" *-- "1" VariableReferenceExpression

Class diagram for QueryPlanner changes

classDiagram
class QueryPlanner {
  + coerce(PlanBuilder, List<Expression>, Analysis, PlanNodeIdAllocator, VariableAllocator, Metadata)
  + static toSymbolReferences(List<VariableReferenceExpression>)
  + static toSymbolReference(VariableReferenceExpression)
}
class PlanAndMappings {
  - PlanBuilder subPlan
  - Map<Expression, VariableReferenceExpression> mappings
}
QueryPlanner "1" *-- "1" PlanAndMappings

Class diagram for SimplePlanRewriter context changes

classDiagram
class SimplePlanRewriter {
  + getNodeRewriter()
}
class Context {
  + getNodeRewriter()
}
Context "1" *-- "1" SimplePlanRewriter

File-Level Changes

Change	Details	Files
Enhance TableFunctionNode to support copartitioning and immutability	Add copartitioningLists field with JsonProperty and propagate through constructors Replace raw collections with ImmutableList/ImmutableMap copies Update replaceChildren and assignStatsEquivalentPlanNode to include copartitioningLists	`presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TableFunctionNode.java`
Refactor output variables and introduce pass-through specification	Modify getOutputVariables to merge declared outputs with pass-through columns Add PassThroughSpecification and PassThroughColumn nested classes with validation Introduce getProperOutputs and getCopartitioningLists accessors	`presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TableFunctionNode.java`
Revise TableArgumentProperties to carry richer metadata	Add argumentName, replace boolean passThroughColumns with PassThroughSpecification Include requiredColumns list and rename specification getter Adjust JsonCreator signature and add corresponding getters	`presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TableFunctionNode.java`
Update testing infrastructure and TestAnalyzer expectations	Standardize connector table function name constants and remove hard-coded strings Rename two_arguments_function to two_scalar_arguments_function and adjust error position offsets Add assertion changes for required column error in TestAnalyzer	`presto-main-base/src/test/java/com/facebook/presto/connector/tvf/TestingTableFunctions.java` `presto-main-base/src/test/java/com/facebook/presto/sql/analyzer/TestAnalyzer.java`
Expose and fix planner/analyzer/parser utilities	Make QueryPlanner.coerce and symbol conversion methods public and add toSymbolReference Remove unused Field.newUnqualified Add getNodeRewriter in SimplePlanRewriter Implement createCatalog in LocalQueryRunner via nodeManager/connectorManager Fix verifyRequiredColumns to use visibleFieldCount Adjust AstBuilder descriptor field parsing for nullable types Propagate copartitioningLists in RelationPlanner and UnaliasSymbolReferences	`presto-main-base/src/main/java/com/facebook/presto/sql/planner/QueryPlanner.java` `presto-analyzer/src/main/java/com/facebook/presto/sql/analyzer/Field.java` `presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/SimplePlanRewriter.java` `presto-main-base/src/main/java/com/facebook/presto/testing/LocalQueryRunner.java` `presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java` `presto-parser/src/main/java/com/facebook/presto/sql/parser/AstBuilder.java` `presto-main-base/src/main/java/com/facebook/presto/sql/planner/RelationPlanner.java` `presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/UnaliasSymbolReferences.java`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com> Co-authored-by: Xin Zhang <desertsxin@gmail.com>

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

Add explicit requireNonNull checks for copartitioningLists and its nested lists in the TableFunctionNode constructor to guard against null inputs and avoid potential NPEs.
Consider renaming PassThroughColumn.getOutputVariables to a singular form like getOutputVariable to better reflect that it returns a single variable and reduce naming confusion.
Review the decision to make QueryPlanner.coerce, toSymbolReferences, and toSymbolReference public—if they are only needed for testing or internal use, consider annotating them with @VisibleForTesting or keeping them non-public.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Add explicit requireNonNull checks for copartitioningLists and its nested lists in the TableFunctionNode constructor to guard against null inputs and avoid potential NPEs.
- Consider renaming PassThroughColumn.getOutputVariables to a singular form like getOutputVariable to better reflect that it returns a single variable and reduce naming confusion.
- Review the decision to make QueryPlanner.coerce, toSymbolReferences, and toSymbolReference public—if they are only needed for testing or internal use, consider annotating them with @VisibleForTesting or keeping them non-public.

## Individual Comments

### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TableFunctionNode.java:78-85` </location>
<code_context>
-        this.outputVariables = requireNonNull(outputVariables, "outputVariables is null");
-        this.sources = requireNonNull(sources, "sources is null");
-        this.tableArgumentProperties = requireNonNull(tableArgumentProperties, "tableArgumentProperties is null");
+        this.arguments = ImmutableMap.copyOf(arguments);
+        this.outputVariables = ImmutableList.copyOf(outputVariables);
+        this.sources = ImmutableList.copyOf(sources);
+        this.tableArgumentProperties = ImmutableList.copyOf(tableArgumentProperties);
+        this.copartitioningLists = copartitioningLists.stream()
+                .map(ImmutableList::copyOf)
+                .collect(toImmutableList());
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Immutable collections are used for constructor arguments, but copartitioningLists is not null-checked.

Add a requireNonNull check for copartitioningLists to prevent possible NullPointerExceptions and maintain consistency with other constructor arguments.

```suggestion
        this.arguments = ImmutableMap.copyOf(arguments);
        this.outputVariables = ImmutableList.copyOf(outputVariables);
        this.sources = ImmutableList.copyOf(sources);
        this.tableArgumentProperties = ImmutableList.copyOf(tableArgumentProperties);
        this.copartitioningLists = requireNonNull(copartitioningLists, "copartitioningLists is null").stream()
                .map(ImmutableList::copyOf)
                .collect(toImmutableList());
        this.handle = requireNonNull(handle, "handle is null");
```
</issue_to_address>

### Comment 2
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TableFunctionNode.java:116` </location>
<code_context>
+        return variables.build();
+    }
+
+    public List<VariableReferenceExpression> getProperOutputs()
     {
         return outputVariables;
</code_context>

<issue_to_address>
**suggestion:** Method name getProperOutputs is ambiguous.

Consider renaming to getOutputVariablesOnly or a more descriptive name, since getOutputVariables is overridden with different logic.

Suggested implementation:

```java
    public List<VariableReferenceExpression> getOutputVariablesOnly()
    {
        return outputVariables;
    }

```

If there are any calls to `getProperOutputs()` elsewhere in this file or in other files, they should be updated to `getOutputVariablesOnly()` to match the new method name.
</issue_to_address>

### Comment 3
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TableFunctionNode.java:265` </location>
<code_context>
+
+    public static class PassThroughColumn
+    {
+        private final VariableReferenceExpression outputVariables;
+        private final boolean isPartitioningColumn;
+
</code_context>

<issue_to_address>
**nitpick:** Field name outputVariables in PassThroughColumn is misleading.

Since outputVariables is a single VariableReferenceExpression, renaming it to outputVariable would improve clarity and maintain naming consistency.
</issue_to_address>

### Comment 4
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TableFunctionNode.java:278` </location>
<code_context>
+        }
+
+        @JsonProperty
+        public VariableReferenceExpression getOutputVariables()
+        {
+            return outputVariables;
</code_context>

<issue_to_address>
**nitpick:** Method getOutputVariables in PassThroughColumn should be getOutputVariable.

Rename the getter to getOutputVariable for consistency with the field name and to distinguish it from the plural method in the parent class.
</issue_to_address>

### Comment 5
<location> `presto-main-base/src/test/java/com/facebook/presto/connector/tvf/TestingTableFunctions.java:539-77` </location>
<code_context>
+    public static class DifferentArgumentTypesFunction
</code_context>

<issue_to_address>
**suggestion (testing):** Missing negative/edge case tests for DifferentArgumentTypesFunction.

Please add tests for error conditions and edge cases, such as missing or invalid arguments, to improve validation coverage for this function.

Suggested implementation:

```java
    public static class DifferentArgumentTypesFunction
            extends AbstractConnectorTableFunction
    {
        public static final String FUNCTION_NAME = "different_arguments_function";
        public DifferentArgumentTypesFunction()
        {
            super(
                    SCHEMA_NAME,
                    FUNCTION_NAME,
                    ImmutableList.of(
                            TableArgumentSpecification.builder()

        // Edge/negative case tests for DifferentArgumentTypesFunction
        @Test(expected = IllegalArgumentException.class)
        public void testMissingRequiredArgument()
        {
            // Attempt to call the function with missing required argument
            // This should throw an IllegalArgumentException or similar
            DifferentArgumentTypesFunction function = new DifferentArgumentTypesFunction();
            // Simulate missing argument (actual invocation may differ based on API)
            function.apply(/* missing required argument */);
        }

        @Test(expected = ClassCastException.class)
        public void testInvalidArgumentType()
        {
            // Attempt to call the function with an argument of the wrong type
            DifferentArgumentTypesFunction function = new DifferentArgumentTypesFunction();
            // Simulate invalid type (actual invocation may differ based on API)
            function.apply("invalid_type_instead_of_expected");
        }

        @Test
        public void testBoundaryValues()
        {
            DifferentArgumentTypesFunction function = new DifferentArgumentTypesFunction();
            // Test with boundary values, e.g., empty table, nulls, etc.
            try {
                function.apply(/* boundary value, e.g., empty table or null */);
            } catch (Exception e) {
                // Assert that the function handles boundary values gracefully
                assertTrue(e instanceof IllegalArgumentException || e instanceof NullPointerException);
            }
        }

```

You may need to adjust the `function.apply(...)` calls to match the actual method signature and argument requirements of `DifferentArgumentTypesFunction`. Also, ensure that the test annotations (`@Test`) are compatible with your test framework (e.g., JUnit).
</issue_to_address>

### Comment 6
<location> `presto-main-base/src/test/java/com/facebook/presto/sql/analyzer/TestAnalyzer.java:1971` </location>
<code_context>
-        analyze("SELECT * FROM TABLE(system.two_arguments_function(text => 'foo', number => 1))");
+        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function('foo'))");
+        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => 'foo'))");
+        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function('foo', 1))");
+        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => 'foo', number => 1))");

</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding tests for edge cases with null and default arguments for two_scalar_arguments_function.

Please add tests for cases with default values, single arguments, and combinations of null and default values to fully cover input scenarios for two_scalar_arguments_function.

Suggested implementation:

```java
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function('foo'))");
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => 'foo'))");

        // Edge cases: null and default arguments
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(NULL, 1))");
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function('foo', NULL))");
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => NULL, number => 1))");
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => 'foo', number => NULL))");

        // If the function supports default values, test with omitted arguments
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function())"); // both default
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => DEFAULT, number => DEFAULT))");
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(DEFAULT, DEFAULT))");

        // Combinations of null and default
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(NULL, DEFAULT))");
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(DEFAULT, NULL))");

        // Single argument cases (already present, but ensure both positional and named)
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function('foo'))");
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => 'foo'))");
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function('foo', 1))");
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => 'foo', number => 1))");

        // Invalid cases: too many arguments, all null, etc.
        assertFails(TABLE_FUNCTION_INVALID_ARGUMENTS, "line 1:58: Too many arguments. Expected at most 2 arguments, got 3 arguments", "SELECT * FROM TABLE(system.two_scalar_arguments_function(1, 2, 3))");
        assertFails(TABLE_FUNCTION_INVALID_ARGUMENTS, "line 1:58: All arguments are null", "SELECT * FROM TABLE(system.two_scalar_arguments_function(NULL, NULL))");

```

- If `DEFAULT` is not a valid keyword in your SQL dialect, replace it with the appropriate syntax for default arguments.
- If the function does not support omitted arguments or defaults, remove those tests or adjust them to expect failures.
- Ensure that the error messages in `assertFails` match the actual error messages produced by your analyzer for null/default misuse.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-10-31T23:45:43Z

presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TableFunctionNode.java

+        return variables.build();
+    }
+
+    public List<VariableReferenceExpression> getProperOutputs()


suggestion: Method name getProperOutputs is ambiguous.

Consider renaming to getOutputVariablesOnly or a more descriptive name, since getOutputVariables is overridden with different logic.

Suggested implementation:

public List<VariableReferenceExpression> getOutputVariablesOnly() { return outputVariables; }

If there are any calls to getProperOutputs() elsewhere in this file or in other files, they should be updated to getOutputVariablesOnly() to match the new method name.

sourcery-ai · 2025-10-31T23:45:43Z

presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TableFunctionNode.java

+
+    public static class PassThroughColumn
+    {
+        private final VariableReferenceExpression outputVariables;


nitpick: Field name outputVariables in PassThroughColumn is misleading.

Since outputVariables is a single VariableReferenceExpression, renaming it to outputVariable would improve clarity and maintain naming consistency.

sourcery-ai · 2025-10-31T23:45:43Z

presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TableFunctionNode.java

+        }
+
+        @JsonProperty
+        public VariableReferenceExpression getOutputVariables()


nitpick: Method getOutputVariables in PassThroughColumn should be getOutputVariable.

Rename the getter to getOutputVariable for consistency with the field name and to distinguish it from the plural method in the parent class.

prestodb-ci added the from:IBM PR from IBM label Oct 31, 2025

Small TVF analyzer/planning changes

21652b4

Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com> Co-authored-by: Xin Zhang <desertsxin@gmail.com>

mohsaka force-pushed the tvf_small_changes branch from daf3bd6 to 21652b4 Compare October 31, 2025 23:40

mohsaka marked this pull request as ready for review October 31, 2025 23:44

mohsaka requested review from a team, ClarenceThreepwood, feilong-liu and jaystarshot as code owners October 31, 2025 23:44

prestodb-ci requested review from a team, Dilli-Babu-Godari and ShahimSharafudeen and removed request for a team October 31, 2025 23:44

sourcery-ai bot reviewed Oct 31, 2025

View reviewed changes

mohsaka requested a review from aditi-pandit October 31, 2025 23:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: TVF Part 6.5/X Small TVF analyzer/planning changes #26505

feat: TVF Part 6.5/X Small TVF analyzer/planning changes #26505

mohsaka commented Oct 31, 2025 •

edited

Loading

sourcery-ai bot commented Oct 31, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

sourcery-ai bot Oct 31, 2025

sourcery-ai bot Oct 31, 2025

sourcery-ai bot Oct 31, 2025

Labels

2 participants

feat: TVF Part 6.5/X Small TVF analyzer/planning changes #26505

Are you sure you want to change the base?

feat: TVF Part 6.5/X Small TVF analyzer/planning changes #26505

Conversation

mohsaka commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Impact

Test Plan

Contributor checklist

Release Notes

sourcery-ai bot commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Class diagram for updated TableFunctionNode and related classes

Class diagram for QueryPlanner changes

Class diagram for SimplePlanRewriter context changes

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

Choose a reason for hiding this comment

sourcery-ai bot Oct 31, 2025

Choose a reason for hiding this comment

sourcery-ai bot Oct 31, 2025

Choose a reason for hiding this comment

sourcery-ai bot Oct 31, 2025

Choose a reason for hiding this comment

Labels

2 participants

mohsaka commented Oct 31, 2025 •

edited

Loading

sourcery-ai bot commented Oct 31, 2025 •

edited

Loading