Skip to content

Conversation

@mohsaka
Copy link
Contributor

@mohsaka mohsaka commented Oct 31, 2025

Description

Smaller changes from #26445 to make the PR more readable. Mainly cleanup/refactoring and some small additions.

Motivation and Context

To make the larger PR simpler. These changes can be standalone.

Impact

Test Plan

Ran updated TestAnalyzer to make sure analysis changes didn't break anything.
Ran TestTableFunctionInvocation to make sure push down table functions aren't broken.

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.
  • If adding new dependencies, verified they have an OpenSSF Scorecard score of 5.0 or higher (or obtained explicit TSC approval for lower scores).

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== NO RELEASE NOTE ==
@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Oct 31, 2025
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Oct 31, 2025

Reviewer's Guide

This PR refactors the table function planning and analysis flow by introducing copartitioning support, enforcing immutability in TableFunctionNode, enriching pass-through column metadata, and aligning tests and planner/parser components with these enhancements.

Class diagram for updated TableFunctionNode and related classes

classDiagram
class TableFunctionNode {
  - Map<String, Argument> arguments
  - List<VariableReferenceExpression> outputVariables
  - List<PlanNode> sources
  - List<TableArgumentProperties> tableArgumentProperties
  - List<List<String>> copartitioningLists
  - TableFunctionHandle handle
  + getArguments()
  + getOutputVariables()
  + getProperOutputs()
  + getTableArgumentProperties()
  + getCopartitioningLists()
  + getHandle()
}
class TableArgumentProperties {
  - String argumentName
  - boolean rowSemantics
  - boolean pruneWhenEmpty
  - PassThroughSpecification passThroughSpecification
  - List<VariableReferenceExpression> requiredColumns
  - Optional<DataOrganizationSpecification> specification
  + getArgumentName()
  + isRowSemantics()
  + isPruneWhenEmpty()
  + getPassThroughSpecification()
  + getRequiredColumns()
  + getSpecification()
}
class PassThroughSpecification {
  - boolean declaredAsPassThrough
  - List<PassThroughColumn> columns
  + isDeclaredAsPassThrough()
  + getColumns()
}
class PassThroughColumn {
  - VariableReferenceExpression outputVariables
  - boolean isPartitioningColumn
  + getOutputVariables()
  + isPartitioningColumn()
}
TableFunctionNode "1" *-- "*" TableArgumentProperties
TableArgumentProperties "1" *-- "1" PassThroughSpecification
PassThroughSpecification "1" *-- "*" PassThroughColumn
PassThroughColumn "1" *-- "1" VariableReferenceExpression
Loading

Class diagram for QueryPlanner changes

classDiagram
class QueryPlanner {
  + coerce(PlanBuilder, List<Expression>, Analysis, PlanNodeIdAllocator, VariableAllocator, Metadata)
  + static toSymbolReferences(List<VariableReferenceExpression>)
  + static toSymbolReference(VariableReferenceExpression)
}
class PlanAndMappings {
  - PlanBuilder subPlan
  - Map<Expression, VariableReferenceExpression> mappings
}
QueryPlanner "1" *-- "1" PlanAndMappings
Loading

Class diagram for SimplePlanRewriter context changes

classDiagram
class SimplePlanRewriter {
  + getNodeRewriter()
}
class Context {
  + getNodeRewriter()
}
Context "1" *-- "1" SimplePlanRewriter
Loading

File-Level Changes

Change Details Files
Enhance TableFunctionNode to support copartitioning and immutability
  • Add copartitioningLists field with JsonProperty and propagate through constructors
  • Replace raw collections with ImmutableList/ImmutableMap copies
  • Update replaceChildren and assignStatsEquivalentPlanNode to include copartitioningLists
presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TableFunctionNode.java
Refactor output variables and introduce pass-through specification
  • Modify getOutputVariables to merge declared outputs with pass-through columns
  • Add PassThroughSpecification and PassThroughColumn nested classes with validation
  • Introduce getProperOutputs and getCopartitioningLists accessors
presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TableFunctionNode.java
Revise TableArgumentProperties to carry richer metadata
  • Add argumentName, replace boolean passThroughColumns with PassThroughSpecification
  • Include requiredColumns list and rename specification getter
  • Adjust JsonCreator signature and add corresponding getters
presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TableFunctionNode.java
Update testing infrastructure and TestAnalyzer expectations
  • Standardize connector table function name constants and remove hard-coded strings
  • Rename two_arguments_function to two_scalar_arguments_function and adjust error position offsets
  • Add assertion changes for required column error in TestAnalyzer
presto-main-base/src/test/java/com/facebook/presto/connector/tvf/TestingTableFunctions.java
presto-main-base/src/test/java/com/facebook/presto/sql/analyzer/TestAnalyzer.java
Expose and fix planner/analyzer/parser utilities
  • Make QueryPlanner.coerce and symbol conversion methods public and add toSymbolReference
  • Remove unused Field.newUnqualified
  • Add getNodeRewriter in SimplePlanRewriter
  • Implement createCatalog in LocalQueryRunner via nodeManager/connectorManager
  • Fix verifyRequiredColumns to use visibleFieldCount
  • Adjust AstBuilder descriptor field parsing for nullable types
  • Propagate copartitioningLists in RelationPlanner and UnaliasSymbolReferences
presto-main-base/src/main/java/com/facebook/presto/sql/planner/QueryPlanner.java
presto-analyzer/src/main/java/com/facebook/presto/sql/analyzer/Field.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/SimplePlanRewriter.java
presto-main-base/src/main/java/com/facebook/presto/testing/LocalQueryRunner.java
presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java
presto-parser/src/main/java/com/facebook/presto/sql/parser/AstBuilder.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/RelationPlanner.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/UnaliasSymbolReferences.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Co-authored-by: kasiafi <30203062+kasiafi@users.noreply.github.com>
Co-authored-by: Xin Zhang <desertsxin@gmail.com>
@mohsaka mohsaka marked this pull request as ready for review October 31, 2025 23:44
@prestodb-ci prestodb-ci requested review from a team, Dilli-Babu-Godari and ShahimSharafudeen and removed request for a team October 31, 2025 23:44
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Add explicit requireNonNull checks for copartitioningLists and its nested lists in the TableFunctionNode constructor to guard against null inputs and avoid potential NPEs.
  • Consider renaming PassThroughColumn.getOutputVariables to a singular form like getOutputVariable to better reflect that it returns a single variable and reduce naming confusion.
  • Review the decision to make QueryPlanner.coerce, toSymbolReferences, and toSymbolReference public—if they are only needed for testing or internal use, consider annotating them with @VisibleForTesting or keeping them non-public.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Add explicit requireNonNull checks for copartitioningLists and its nested lists in the TableFunctionNode constructor to guard against null inputs and avoid potential NPEs.
- Consider renaming PassThroughColumn.getOutputVariables to a singular form like getOutputVariable to better reflect that it returns a single variable and reduce naming confusion.
- Review the decision to make QueryPlanner.coerce, toSymbolReferences, and toSymbolReference public—if they are only needed for testing or internal use, consider annotating them with @VisibleForTesting or keeping them non-public.

## Individual Comments

### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TableFunctionNode.java:78-85` </location>
<code_context>
-        this.outputVariables = requireNonNull(outputVariables, "outputVariables is null");
-        this.sources = requireNonNull(sources, "sources is null");
-        this.tableArgumentProperties = requireNonNull(tableArgumentProperties, "tableArgumentProperties is null");
+        this.arguments = ImmutableMap.copyOf(arguments);
+        this.outputVariables = ImmutableList.copyOf(outputVariables);
+        this.sources = ImmutableList.copyOf(sources);
+        this.tableArgumentProperties = ImmutableList.copyOf(tableArgumentProperties);
+        this.copartitioningLists = copartitioningLists.stream()
+                .map(ImmutableList::copyOf)
+                .collect(toImmutableList());
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Immutable collections are used for constructor arguments, but copartitioningLists is not null-checked.

Add a requireNonNull check for copartitioningLists to prevent possible NullPointerExceptions and maintain consistency with other constructor arguments.

```suggestion
        this.arguments = ImmutableMap.copyOf(arguments);
        this.outputVariables = ImmutableList.copyOf(outputVariables);
        this.sources = ImmutableList.copyOf(sources);
        this.tableArgumentProperties = ImmutableList.copyOf(tableArgumentProperties);
        this.copartitioningLists = requireNonNull(copartitioningLists, "copartitioningLists is null").stream()
                .map(ImmutableList::copyOf)
                .collect(toImmutableList());
        this.handle = requireNonNull(handle, "handle is null");
```
</issue_to_address>

### Comment 2
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TableFunctionNode.java:116` </location>
<code_context>
+        return variables.build();
+    }
+
+    public List<VariableReferenceExpression> getProperOutputs()
     {
         return outputVariables;
</code_context>

<issue_to_address>
**suggestion:** Method name getProperOutputs is ambiguous.

Consider renaming to getOutputVariablesOnly or a more descriptive name, since getOutputVariables is overridden with different logic.

Suggested implementation:

```java
    public List<VariableReferenceExpression> getOutputVariablesOnly()
    {
        return outputVariables;
    }

```

If there are any calls to `getProperOutputs()` elsewhere in this file or in other files, they should be updated to `getOutputVariablesOnly()` to match the new method name.
</issue_to_address>

### Comment 3
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TableFunctionNode.java:265` </location>
<code_context>
+
+    public static class PassThroughColumn
+    {
+        private final VariableReferenceExpression outputVariables;
+        private final boolean isPartitioningColumn;
+
</code_context>

<issue_to_address>
**nitpick:** Field name outputVariables in PassThroughColumn is misleading.

Since outputVariables is a single VariableReferenceExpression, renaming it to outputVariable would improve clarity and maintain naming consistency.
</issue_to_address>

### Comment 4
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/plan/TableFunctionNode.java:278` </location>
<code_context>
+        }
+
+        @JsonProperty
+        public VariableReferenceExpression getOutputVariables()
+        {
+            return outputVariables;
</code_context>

<issue_to_address>
**nitpick:** Method getOutputVariables in PassThroughColumn should be getOutputVariable.

Rename the getter to getOutputVariable for consistency with the field name and to distinguish it from the plural method in the parent class.
</issue_to_address>

### Comment 5
<location> `presto-main-base/src/test/java/com/facebook/presto/connector/tvf/TestingTableFunctions.java:539-77` </location>
<code_context>
+    public static class DifferentArgumentTypesFunction
</code_context>

<issue_to_address>
**suggestion (testing):** Missing negative/edge case tests for DifferentArgumentTypesFunction.

Please add tests for error conditions and edge cases, such as missing or invalid arguments, to improve validation coverage for this function.

Suggested implementation:

```java
    public static class DifferentArgumentTypesFunction
            extends AbstractConnectorTableFunction
    {
        public static final String FUNCTION_NAME = "different_arguments_function";
        public DifferentArgumentTypesFunction()
        {
            super(
                    SCHEMA_NAME,
                    FUNCTION_NAME,
                    ImmutableList.of(
                            TableArgumentSpecification.builder()

        // Edge/negative case tests for DifferentArgumentTypesFunction
        @Test(expected = IllegalArgumentException.class)
        public void testMissingRequiredArgument()
        {
            // Attempt to call the function with missing required argument
            // This should throw an IllegalArgumentException or similar
            DifferentArgumentTypesFunction function = new DifferentArgumentTypesFunction();
            // Simulate missing argument (actual invocation may differ based on API)
            function.apply(/* missing required argument */);
        }

        @Test(expected = ClassCastException.class)
        public void testInvalidArgumentType()
        {
            // Attempt to call the function with an argument of the wrong type
            DifferentArgumentTypesFunction function = new DifferentArgumentTypesFunction();
            // Simulate invalid type (actual invocation may differ based on API)
            function.apply("invalid_type_instead_of_expected");
        }

        @Test
        public void testBoundaryValues()
        {
            DifferentArgumentTypesFunction function = new DifferentArgumentTypesFunction();
            // Test with boundary values, e.g., empty table, nulls, etc.
            try {
                function.apply(/* boundary value, e.g., empty table or null */);
            } catch (Exception e) {
                // Assert that the function handles boundary values gracefully
                assertTrue(e instanceof IllegalArgumentException || e instanceof NullPointerException);
            }
        }

```

You may need to adjust the `function.apply(...)` calls to match the actual method signature and argument requirements of `DifferentArgumentTypesFunction`. Also, ensure that the test annotations (`@Test`) are compatible with your test framework (e.g., JUnit).
</issue_to_address>

### Comment 6
<location> `presto-main-base/src/test/java/com/facebook/presto/sql/analyzer/TestAnalyzer.java:1971` </location>
<code_context>
-        analyze("SELECT * FROM TABLE(system.two_arguments_function(text => 'foo', number => 1))");
+        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function('foo'))");
+        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => 'foo'))");
+        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function('foo', 1))");
+        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => 'foo', number => 1))");

</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding tests for edge cases with null and default arguments for two_scalar_arguments_function.

Please add tests for cases with default values, single arguments, and combinations of null and default values to fully cover input scenarios for two_scalar_arguments_function.

Suggested implementation:

```java
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function('foo'))");
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => 'foo'))");

        // Edge cases: null and default arguments
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(NULL, 1))");
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function('foo', NULL))");
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => NULL, number => 1))");
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => 'foo', number => NULL))");

        // If the function supports default values, test with omitted arguments
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function())"); // both default
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => DEFAULT, number => DEFAULT))");
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(DEFAULT, DEFAULT))");

        // Combinations of null and default
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(NULL, DEFAULT))");
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(DEFAULT, NULL))");

        // Single argument cases (already present, but ensure both positional and named)
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function('foo'))");
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => 'foo'))");
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function('foo', 1))");
        analyze("SELECT * FROM TABLE(system.two_scalar_arguments_function(text => 'foo', number => 1))");

        // Invalid cases: too many arguments, all null, etc.
        assertFails(TABLE_FUNCTION_INVALID_ARGUMENTS, "line 1:58: Too many arguments. Expected at most 2 arguments, got 3 arguments", "SELECT * FROM TABLE(system.two_scalar_arguments_function(1, 2, 3))");
        assertFails(TABLE_FUNCTION_INVALID_ARGUMENTS, "line 1:58: All arguments are null", "SELECT * FROM TABLE(system.two_scalar_arguments_function(NULL, NULL))");

```

- If `DEFAULT` is not a valid keyword in your SQL dialect, replace it with the appropriate syntax for default arguments.
- If the function does not support omitted arguments or defaults, remove those tests or adjust them to expect failures.
- Ensure that the error messages in `assertFails` match the actual error messages produced by your analyzer for null/default misuse.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
return variables.build();
}

public List<VariableReferenceExpression> getProperOutputs()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Method name getProperOutputs is ambiguous.

Consider renaming to getOutputVariablesOnly or a more descriptive name, since getOutputVariables is overridden with different logic.

Suggested implementation:

    public List<VariableReferenceExpression> getOutputVariablesOnly()
    {
        return outputVariables;
    }

If there are any calls to getProperOutputs() elsewhere in this file or in other files, they should be updated to getOutputVariablesOnly() to match the new method name.


public static class PassThroughColumn
{
private final VariableReferenceExpression outputVariables;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: Field name outputVariables in PassThroughColumn is misleading.

Since outputVariables is a single VariableReferenceExpression, renaming it to outputVariable would improve clarity and maintain naming consistency.

}

@JsonProperty
public VariableReferenceExpression getOutputVariables()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: Method getOutputVariables in PassThroughColumn should be getOutputVariable.

Rename the getter to getOutputVariable for consistency with the field name and to distinguish it from the plural method in the parent class.

@mohsaka mohsaka requested a review from aditi-pandit October 31, 2025 23:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

2 participants