sql: incorrect outer join parsing #6875

asenac · 2021-05-27T07:34:44Z

What version of Materialize are you using?

$ materialized -v
materialized v0.7.4-dev (19c296dcd)

How did you install Materialize?

What was the issue?

The following query

materialize=> explain raw plan for select * from
    foo,                       
    bar left join baz on y = z;
;
              Raw Plan              
------------------------------------
 %0 =                              +
 | Get materialize.public.foo (u9) +
                                   +
 %1 =                              +
 | Get materialize.public.bar (u11)+
                                   +
 %2 =                              +
 | InnerJoin %0 %1 on true         +
                                   +
 %3 =                              +
 | Get materialize.public.baz (u13)+
                                   +
 %4 =                              +
 | LeftOuterJoin %2 %3 on (#1 = #2)+
(1 row)
materialize=>

should be syntactically equivalent to this other one:

materialize=> explain raw plan for select * from
    foo,
    (select * from bar left join baz on y = z);
;
              Raw Plan              
------------------------------------
 %0 =                              +
 | Get materialize.public.foo (u9) +
                                   +
 %1 =                              +
 | Get materialize.public.bar (u11)+
                                   +
 %2 =                              +
 | Get materialize.public.baz (u13)+
                                   +
 %3 =                              +
 | LeftOuterJoin %1 %2 on (#0 = #1)+
                                   +
 %4 =                              +
 | InnerJoin %0 %3 on true         +
(1 row)
materialize=>

However, they result in different raw plan, where the joins happen in different order. The right plan should be the one in the second query, since the comma operator has lower precedence.

This also leads to a name resolution issue whereby the sibling tables in the comma join are seen from the ON clause of the outer join, where they shouldn't be seen. Hence, the following query is accepted while it should fail:

materialize=> explain select * from
    foo,
    bar left join baz on foo.y = baz.x;
                Optimized Plan                
----------------------------------------------
 %0 =                                        +
 | Get materialize.public.foo (u412)         +
 | ArrangeBy ()                              +
                                             +
 %1 =                                        +
 | Get materialize.public.bar (u414)         +
                                             +
 %2 = Let l0 =                               +
 | Join %0 %1                                +
 | | implementation = Differential %1 %0.()  +
 | | demand = (#0..#3)                       +
                                             +
 %3 =                                        +
 | Get %2 (l0)                               +
 | Filter !(isnull(#1))                      +
 | ArrangeBy (#1)                            +
                                             +
 %4 =                                        +
 | Get materialize.public.baz (u416)         +
 | Filter !(isnull(#0))                      +
                                             +
 %5 = Let l1 =                               +
 | Join %3 %4 (= #1 #4)                      +
 | | implementation = Differential %4 %3.(#1)+
 | | demand = (#0..#3, #5)                   +
                                             +
 %6 =                                        +
 | Get %5 (l1)                               +
 | Distinct group=(#1)                       +
 | ArrangeBy (#0)                            +
                                             +
 %7 =                                        +
 | Join %2 %6 (= #1 #4)                      +
 | | implementation = Differential %2 %6.(#0)+
 | | demand = (#0..#3)                       +
 | Negate                                    +
 | Project (#0..#3)                          +
                                             +
 %8 =                                        +
 | Union %7 %2                               +
 | Map null, null                            +
                                             +
 %9 =                                        +
 | Get %5 (l1)                               +
 | Project (#0..#3, #1, #5)                  +
                                             +
 %10 =                                       +
 | Union %8 %9                               +
 
(1 row)

materialize=>

... compared to postgres:

asenac=# explain select * from
    foo,
    bar left join baz on foo.y = baz.x;
ERROR:  invalid reference to FROM-clause entry for table "foo"
LINE 3:     bar left join baz on foo.y = baz.x;
                                 ^
HINT:  There is an entry for table "foo", but it cannot be referenced from this part of the query.
asenac=#

Is the issue reproducible? If so, please provide reproduction instructions.

Please attach any applicable log files.

The text was updated successfully, but these errors were encountered:

frankmcsherry · 2021-05-27T11:15:00Z

Also, while this may be less stressful for left join, as I think it commutes and doesn't give wrong answers, this same issue with right join may give incorrect answers if the inner join is applied before the outer join, rather than after.

asenac · 2021-05-27T11:19:28Z

It gives wrong answers for left join in the sense that it accepts queries that are semantically incorrect like the one above which should fail due to foo not being visible from within the ON clause of the left join.

asenac · 2021-05-27T11:49:33Z

The parser seems to do the right thing. The problem seems to be in plan/query.rs where we incorrectly convert the comma-join flatten tree into a left deep tree, making the symbols from the left siblings available:

    // Step 1. Handle FROM clause, including joins.
    let (mut relation_expr, from_scope) =
        from.iter().fold(Ok(plan_join_identity(qcx)), |l, twj| {
            let (left, left_scope) = l?;
            plan_table_with_joins(qcx, left, left_scope, &JoinOperator::CrossJoin, twj)
        })?;

asenac · 2021-05-27T16:11:24Z

Another issue in the name resolution logic:

Postgres:

asenac=# select * from t1 b, (t1 c inner join lateral(select b.f1 from t2) d on true);
 f1 | f2 | f3 | f1 | f2 | f3 | f1 
----+----+----+----+----+----+----
(0 rows)

asenac=#

Materialize:

materialize=> select * from t1 b, (t1 c inner join lateral(select b.f1 from t2) d on true);
ERROR:  column "b.f1" does not exist
materialize=>

asenac · 2021-06-01T10:06:19Z

Fixing this is not trivial. The problem is that for the following join

A, B left join C

the join tree generated by the planner is:

    LJ
   /  \
  IJ   C
 /  \
A    B

but according to the SQL semantics, the join tree generated should be this instead:

    IJ
   /  \
  A    LJ
      /  \
     B    C

ie. the following two queries should lead to the same join tree but they don't:

materialize=> explain raw plan for select * from a, b left join c on b.a = c.a;
              Raw Plan              
------------------------------------
 %0 =                              +
 | Get materialize.public.a (u570) +
                                   +
 %1 =                              +
 | Get materialize.public.b (u572) +
                                   +
 %2 =                              +
 | InnerJoin %0 %1 on true         +
                                   +
 %3 =                              +
 | Get materialize.public.c (u574) +
                                   +
 %4 =                              +
 | LeftOuterJoin %2 %3 on (#2 = #4)+
 
(1 row)

materialize=> explain raw plan for select * from a, (b left join c on b.a = c.a);
              Raw Plan              
------------------------------------
 %0 =                              +
 | Get materialize.public.a (u570) +
                                   +
 %1 =                              +
 | Get materialize.public.b (u572) +
                                   +
 %2 =                              +
 | Get materialize.public.c (u574) +
                                   +
 %3 =                              +
 | LeftOuterJoin %1 %2 on (#0 = #2)+
                                   +
 %4 =                              +
 | InnerJoin %0 %3 on true         +
 
(1 row)

materialize=>

My first attempt at fixing this was about treating all comma-join members as nested joins but that broke lateral joins, for the same reason lateral joins don't work within nested joins as they should:

materialize=> explain raw plan for select * from a, (b left join lateral(select a.a as a from c) c on b.a = c.a);
ERROR:  column "a.a" does not exist
materialize=>

In order for the sibling context to be seen from within a nested join, we need to mark all intermediate joins as LATERAL, which is wrong. So, in order for the query above to work, we need to write it as:

materialize=> explain raw plan for select * from a, lateral(select * from b left join lateral(select a.a as a from c) c on b.a = c.a);
                 Raw Plan                  
-------------------------------------------
 %0 =                                     +
 | Get materialize.public.a (u576)        +
                                          +
 %1 =                                     +
 | Get materialize.public.b (u578)        +
                                          +
 %2 =                                     +
 | Get materialize.public.c (u580)        +
 | Map #^^0                               +
 | Project (#1)                           +
                                          +
 %3 =                                     +
 | LeftOuterLateralJoin %1 %2 on (#0 = #1)+
                                          +
 %4 =                                     +
 | InnerLateralJoin %0 %3 on true         +
 
(1 row)

materialize=>

A few thoughts:

There is no concept of sibling scope for properly supporting lateral joins.
I don't think the representation used for column references (level, column offset) would work with sibling scopes.
The lateral flag used in the HirRelationExpr joins for lateral joins assumes that the right side is correlated wrt the left side, but lateral joins may be correlated with some other table in the sibling scope of some outer scope.
- We should probably traverse the right hand expression in plan_join_operator for finding references to elements in the left hand expression to set that flag, instead of setting it to true for LATERAL joins. In fact, that flag should probably be renamed as 'correlated'.

philip-stoev · 2021-06-01T10:23:10Z

As an interim measure, can we throw an error "mixing of comma joins with XXX" is not allowed?

asenac · 2021-06-02T08:45:13Z

Proposal

In order for lateral joins to be supported properly, which is needed for fixing this join planning issue, we need to support lateral joins in nested joins. The scope of a lateral join operand contains all siblings in the left hand side of the current join, including the left-hand siblings of the ancestors of the current node.

For example, in the following join tree, the lateral scope of G is {B, D, F} , and not just {F} as in our current implementation.

Given that we use ColumnRef::level as the identifier of the scope a given column comes from, my proposal would be for any join to create a new scope, which will be the lateral scope for its children. The parent scope of the lateral scope of a join node will be the lateral scope of the parent join if any, or the outer scope of the query.

pub struct ColumnRef {
    // scope level, where 0 is the current scope and 1+ are outer scopes.
    pub level: usize,
    // level-local column identifier used.
    pub column: usize,
}

Every time we descend a join node, then, the outer scope is then one level farther. Non-lateral join operands must then be aware that the outer scope is not at level 1, but that there are some phantom scopes in between that cannot be seen. This could be modeled by adding a outer_scope_level_offset in Scope struct, defaulted to 1, which is increased and decreased every time we descend a join node and return from it respectively.

We also need to change the semantics of this method:

impl JoinKind {
    pub fn is_lateral(&self) -> bool {
        match self {
            JoinKind::Inner { lateral } | JoinKind::LeftOuter { lateral } => *lateral,
            JoinKind::RightOuter | JoinKind::FullOuter => false,
        }
    }
}

The lateral flag currently always comes from the homonyms flag in the syntax tree, and it serves two purposes:

indicates that the current join creates a new scope.
indicates that the right hand side is correlated with the left hand side.

However, in order to support lateral joins in nested joins, as we have mentioned, all joins will create a new scope, but also that the right hand side might be correlated with a sibling node of some ancestor. In that case, we need to mark as lateral, not the current join, but the ancestor join the correlated symbol is coming from. We can detect this by visiting all columns in the right hand side in plan_join_operator finding any column with col.level - depth == 1. The value of the flag will then be the result of this correlation analysis rather than propagating a flag from the AST.

Lastly, we would need to modify lowering so that the ColumnMap is always updated for every join, and not only for joins marked as lateral, to increase the level of the outer columns by 1.

@benesch, any comments on this proposal? Am I missing anything?

Personally, I find this Hir representation a bit hard to deal with, since the code generating it and the code consuming it need to agree on what introduces a new scope, without that contract being explicit anywhere. I'd prefer using a Query Graph Model representation, which column references consist in a pointer to the referenced quantifier (join operand) and the column offset within the columns of that join operand (see example of a lateral join below). This Query Graph Model structure also contains all the information for name resolution purposes.

asenac · 2021-06-02T10:15:39Z

A few examples of the raw plan after this proposal.

In the following one, the LATERAL join operand is correlated with the previous table in the comma join, and hence, that is the join marked as LATERAL (correlated), since its right hand side is correlated with its left hand side.

Double correlation. In this other example, the lateral join operand is correlated with both its sibling in the left join and the sibling of the left join in the comma join. Hence, both joins are marked as LATERAL (again, meaning correlated).

Edit: Note that the join with the identity tuple can be eliminated in most cases leaving the plan as follows:

asenac · 2021-06-02T10:35:51Z

Note that this HirRelationExpr that treats all joins as binary joins leads to non-optimal dataflow plans for queries like:

select * from a, b, lateral(select * from c where a.a = c.a)...

since it leads to a correlated inner join between C and the result of joining A and B. However, ideally we should only evaluate C for every value of A. In the query graph model, which is a representation closer to the SQL semantics, comma joins are represented as N-way cross joins:

With such representation, planning join orders based on the dependencies of its operands is trivial. Even nested joins can be flattened to ease this dependency analysis needed for efficient join planning. The following example shows how a query with a nested join before and after being flattened:

Another criticizable aspect HirRelationExpr representation is that it imports too many concepts from the syntactic representation, such as the visibility scopes or the concept of LEFT and RIGHT. I think there is a lot of reasoning and query simplification that should happen at that stage, before lowering, which would benefit from QGM-like representation.

asenac · 2021-06-03T09:22:15Z

For context, the following query was the one being analyzed when this issue was spotted:

materialize=> with dataflow_address(address) as (select address[1] from mz_dataflow_names n, mz_dataflow_operator_addresses o where n.id = o.id and n.name like 'Dataflow: %delta_join%'),
     dataflow_addresses as (select * from mz_dataflow_operator_addresses where address[1] = (select address from dataflow_address)),
     dataflow_channels as (select c.id, o.address, source_node, target_node from mz_dataflow_channels c, mz_dataflow_operator_addresses o where o.id = c.id and c.id in (select id from dataflow_addresses)),
     dataflow_operators as (select name, address, n.id from mz_dataflow_operators n, dataflow_addresses a where n.id = a.id)
select count(*)
from
    dataflow_operators src,
    dataflow_operators dst,
    dataflow_channels c left join mz_message_counts mc on c.id = mc.channel
where
    c.address = src.address[1:list_length(src.address) - 1]
    and c.address = dst.address[1:list_length(dst.address) - 1]
    and c.source_node = src.address[list_length(src.address)]
    and c.target_node = dst.address[list_length(dst.address)];
 count 
-------
    63
(1 row)

Time: 10139.692 ms (00:10.140)
materialize=> with dataflow_address(address) as (select address[1] from mz_dataflow_names n, mz_dataflow_operator_addresses o where n.id = o.id and n.name like 'Dataflow: %delta_join%'),
     dataflow_addresses as (select * from mz_dataflow_operator_addresses where address[1] = (select address from dataflow_address)),
     dataflow_channels as (select c.id, o.address, source_node, target_node from mz_dataflow_channels c, mz_dataflow_operator_addresses o where o.id = c.id and c.id in (select id from dataflow_addresses)),
     dataflow_operators as (select name, address, n.id from mz_dataflow_operators n, dataflow_addresses a where n.id = a.id)
select count(*)
from
    dataflow_operators src,
    dataflow_operators dst,
    (dataflow_channels c left join mz_message_counts mc on c.id = mc.channel)
where
    c.address = src.address[1:list_length(src.address) - 1]
    and c.address = dst.address[1:list_length(dst.address) - 1]
    and c.source_node = src.address[list_length(src.address)]
    and c.target_node = dst.address[list_length(dst.address)];
 count 
-------
    63
(1 row)

Time: 50.900 ms
materialize=>

The second version is of the query just wraps the left join with parenthesis, which makes the planner treat it as a derived join and hence, it doesn't push the comma join into the the left hand side of the left join. As shown above, that makes the query significantly faster (200x faster).

Note that the only 'real' issue in the planner is the lack of proper support for lateral joins, since even if the join tree built for the first query doesn't correspond with the pure SQL interpretation of the query, it is semantically equivalent, and hence, the optimizer should be able to produce the same plan for both. So, even after fixing the join tree issue we should investigate why the optimizer wasn't able to reduce both queries to the same plan. This is a typical star-query that may come in different flavors and, although the most common one is the one below, it's not unlikely that they will come in all sort of flavors.

materialize=> with dataflow_address(address) as (select address[1] from mz_dataflow_names n, mz_dataflow_operator_addresses o where n.id = o.id and n.name like 'Dataflow: %delta_join%'),
     dataflow_addresses as (select * from mz_dataflow_operator_addresses where address[1] = (select address from dataflow_address)),
     dataflow_channels as (select c.id, o.address, source_node, target_node from mz_dataflow_channels c, mz_dataflow_operator_addresses o where o.id = c.id and c.id in (select id from dataflow_addresses)),
     dataflow_operators as (select name, address, n.id from mz_dataflow_operators n, dataflow_addresses a where n.id = a.id)
select count(*)
from
    dataflow_channels c 
    inner join dataflow_operators src on
        c.address = src.address[1:list_length(src.address) - 1]
        and c.source_node = src.address[list_length(src.address)]
    inner join dataflow_operators dst on 
        c.address = dst.address[1:list_length(dst.address) - 1]
        and c.target_node = dst.address[list_length(dst.address)]
    left join mz_message_counts mc on c.id = mc.channel;
 count 
-------
    63
(1 row)

Time: 51.781 ms
materialize=>

asenac · 2021-06-03T09:32:05Z

This seems to be a predicate pushdown issue when the join predicates for the inner joins are put in the where clause. The following query builds the same join structure like the one above but, by just putting the predicates in the where clause, we get again poor performance.

materialize=> with dataflow_address(address) as (select address[1] from mz_dataflow_names n, mz_dataflow_operator_addresses o where n.id = o.id and n.name like 'Dataflow: %delta_join%'),
materialize->      dataflow_addresses as (select * from mz_dataflow_operator_addresses where address[1] = (select address from dataflow_address)),
materialize->      dataflow_channels as (select c.id, o.address, source_node, target_node from mz_dataflow_channels c, mz_dataflow_operator_addresses o where o.id = c.id and c.id in (select id from dataflow_addresses)),
materialize->      dataflow_operators as (select name, address, n.id from mz_dataflow_operators n, dataflow_addresses a where n.id = a.id)
materialize-> select count(*)
materialize-> from
materialize->     dataflow_operators src inner join
materialize->     dataflow_operators dst on true inner join
materialize->     dataflow_channels c on true left join mz_message_counts mc on c.id = mc.channel
materialize-> where
materialize->     c.address = src.address[1:list_length(src.address) - 1]
materialize->     and c.address = dst.address[1:list_length(dst.address) - 1]
materialize->     and c.source_node = src.address[list_length(src.address)]
materialize->     and c.target_node = dst.address[list_length(dst.address)];
 count 
-------
    63
(1 row)

Time: 10532.136 ms (00:10.532)
materialize=>

asenac · 2021-06-17T07:43:46Z

Example of wrong results due to this issue:

materialize=> select * from a, b full outer join c on b = c;
 a | b | c 
---+---+---
   |   | 3
 1 | 2 |  
(2 rows)

materialize=>

asenac=# select * from a, b full outer join c on b = c;
 a | b | c 
---+---+---
 1 | 2 |  
 1 |   | 3
(2 rows)

asenac=#

As Frank mentioned above, non-left-commutative joins, like right join or full outer join, lead to wrong results.

uce · 2021-07-14T15:45:40Z

In the dataflow weekly we decided to error out instead of fixing the root cause. Fixing the root cause should become much easier after the planned QGM work progresses.

Fixes MaterializeInc#6875

sploiselle · 2021-12-14T03:41:08Z

Just want to ping this issue that I have a potential solution for the "incorrect" part of the planning issues in c2a4caa, though it obviously doesn't move the needle on the scoping complexities we're ill-equipped to handle. Very welcome to get any eyes on #9489.

We previously treated comma joins as cross joins, but comma joins should instead be generally handled as nested cross joins to support PG's join precedence semantics. However, our current scope and lateral join implementations preclude this approach from working for relations that rely on lateral joins– nesting prohibits the rhs from seeing the lhs' items. In this case, we continue to error as we did before. Improves MaterializeInc#6875

benesch · 2021-12-15T06:57:47Z

I'm going to reopen this because I'm finding it hard to reason about the correctness of our current LATERAL planning. WIP branch here: db643f8

We previously treated comma joins as cross joins, but comma joins should instead be generally handled as nested cross joins to support PG's join precedence semantics. However, our current scope and lateral join implementations preclude this approach from working for relations that rely on lateral joins– nesting prohibits the rhs from seeing the lhs' items. In this case, we continue to error as we did before. Improves MaterializeInc#6875

@asenac

Rework join planning so that we can handle lateral references that are mixed with right and full outer joins. The implementation here largely follows what @asenac laid out in @6875, except that the existing outer scope design was found to be sufficient; that is, there is no concept of a "sibling" scope introduced here. The gist is that we now always plan joins with the RHS as a child scope of the LHS. (Previously, we only created the new scope for joins if we could eagerly notice that the RHS had a lateral element.) If the join is a right or full join, we ban referencing any of the elements in the left scope, as required by SQL:2008. When we encounter a non-lateral subquery, we hide any lateral scope items entirely. Yes, it's weird that non-lateral subqueries pretend lateral scope items don't exit, while subqueries on the RHS of a right or full join *error* if they access a lateral scope item, but that's what SQL demands. The lowering code is amended to always treat inner and left joins as correlated, even if they are not. This is correct, but it may produce worse plans. I haven't investigated the movement of plans in much detail yet. Fix MaterializeInc#6875.

@asenac

Rework join planning so that we can handle lateral references that are mixed with right and full outer joins. The implementation here largely follows what @asenac laid out in @6875, except that the existing outer scope design was found to be sufficient; that is, there is no concept of a "sibling" scope introduced here. The gist is that we now always plan joins with the RHS as a child scope of the LHS. (Previously, we only created the new scope for joins if we could eagerly notice that the RHS had a lateral element.) If the join is a right or full join, we ban referencing any of the elements in the left scope, as required by SQL:2008. When we encounter a non-lateral subquery, we hide any lateral scope items entirely. Yes, it's weird that non-lateral subqueries pretend lateral scope items don't exit, while subqueries on the RHS of a right or full join *error* if they access a lateral scope item, but that's what SQL demands. The lowering code is amended to always treat inner and left joins as correlated, even if they are not. This is correct, but it may produce worse plans. I haven't investigated the movement of plans in much detail yet. Fix MaterializeInc#6875.

@asenac

Rework join planning so that we can handle lateral references that are mixed with right and full outer joins. The implementation here largely follows what @asenac laid out in @6875, except that the existing outer scope design was found to be sufficient; that is, there is no concept of a "sibling" scope introduced here. The gist is that we now always plan joins with the RHS as a child scope of the LHS. (Previously, we only created the new scope for joins if we could eagerly notice that the RHS had a lateral element.) If the join is a right or full join, we ban referencing any of the elements in the left scope, as required by SQL:2008. When we encounter a non-lateral subquery, we hide any lateral scope items entirely. Yes, it's weird that non-lateral subqueries pretend lateral scope items don't exit, while subqueries on the RHS of a right or full join *error* if they access a lateral scope item, but that's what SQL demands. The lowering code is amended to always treat inner and left joins as correlated, even if they are not. This is correct, but it may produce worse plans. I haven't investigated the movement of plans in much detail yet. Fix MaterializeInc#6875.

@asenac

Rework join planning so that we can handle lateral references that are mixed with right and full outer joins. The implementation here largely follows what @asenac laid out in @6875, except that the existing outer scope design was found to be sufficient; that is, there is no concept of a "sibling" scope introduced here. The gist is that we now always plan joins with the RHS as a child scope of the LHS. (Previously, we only created the new scope for joins if we could eagerly notice that the RHS had a lateral element.) If the join is a right or full join, we ban referencing any of the elements in the left scope, as required by SQL:2008. When we encounter a non-lateral subquery, we hide any lateral scope items entirely. Yes, it's weird that non-lateral subqueries pretend lateral scope items don't exit, while subqueries on the RHS of a right or full join *error* if they access a lateral scope item, but that's what SQL demands. The lowering code is amended to always treat inner and left joins as correlated, even if they are not. This is correct, but it may produce worse plans. I haven't investigated the movement of plans in much detail yet. Fix MaterializeInc#6875.

@asenac

Rework join planning so that we can handle lateral references that are mixed with right and full outer joins. The implementation here largely follows what @asenac laid out in @6875, except that the existing outer scope design was found to be sufficient; that is, there is no concept of a "sibling" scope introduced here. The gist is that we now always plan joins with the RHS as a child scope of the LHS. (Previously, we only created the new scope for joins if we could eagerly notice that the RHS had a lateral element.) If the join is a right or full join, we ban referencing any of the elements in the left scope, as required by SQL:2008. When we encounter a non-lateral subquery, we hide any lateral scope items entirely. Yes, it's weird that non-lateral subqueries pretend lateral scope items don't exit, while subqueries on the RHS of a right or full join *error* if they access a lateral scope item, but that's what SQL demands. The lowering code is amended to always treat inner and left joins as correlated, even if they are not. This is correct, but it may produce worse plans. I haven't investigated the movement of plans in much detail yet. Fix MaterializeInc#6875.

@asenac

Rework join planning so that we can handle lateral references that are mixed with right and full outer joins. The implementation here largely follows what @asenac laid out in @6875, except that the existing outer scope design was found to be sufficient; that is, there is no concept of a "sibling" scope introduced here. The gist is that we now always plan joins with the RHS as a child scope of the LHS. (Previously, we only created the new scope for joins if we could eagerly notice that the RHS had a lateral element.) If the join is a right or full join, we ban referencing any of the elements in the left scope, as required by SQL:2008. When we encounter a non-lateral subquery, we hide any lateral scope items entirely. Yes, it's weird that non-lateral subqueries pretend lateral scope items don't exit, while subqueries on the RHS of a right or full join *error* if they access a lateral scope item, but that's what SQL demands. Fix MaterializeInc#6875.

@asenac

Rework join planning so that we can handle lateral references that are mixed with right and full outer joins. The implementation here largely follows what @asenac laid out in @6875, except that the existing outer scope design was found to be sufficient; that is, there is no concept of a "sibling" scope introduced here. The gist is that we now always plan joins with the RHS as a child scope of the LHS. (Previously, we only created the new scope for joins if we could eagerly notice that the RHS had a lateral element.) If the join is a right or full join, we ban referencing any of the elements in the left scope, as required by SQL:2008. When we encounter a non-lateral subquery, we hide any lateral scope items entirely. Yes, it's weird that non-lateral subqueries pretend lateral scope items don't exit, while subqueries on the RHS of a right or full join *error* if they access a lateral scope item, but that's what SQL demands. Fix MaterializeInc#6875.

asenac added C-bug Category: something is broken T-correctness Theme: relates to consistency and correctness of results. labels May 27, 2021

frankmcsherry added this to To Do in Top priority via automation May 27, 2021

awang assigned asenac May 27, 2021

benesch added the Top Priority label Jun 2, 2021

benesch added this to In Progress in SQL and Coordinator Jun 2, 2021

cuongdo added this to In progress in Compute Jun 2, 2021

asenac mentioned this issue Jun 3, 2021

optimizer: unable to push down predicates through outer joins to the preserving side #6932

Open

benesch added P0 critical issue, fix immediately, associated with bugs that demand immediate attention and removed Top Priority labels Jun 7, 2021

asenac moved this from In progress to To do in Compute Jul 6, 2021

uce moved this from In Progress to To do in SQL and Coordinator Jul 14, 2021

elindsey removed this from To do in SQL and Coordinator Jul 16, 2021

asenac linked a pull request Jul 20, 2021 that will close this issue

Throw an error for unsupported outer joins in comma join #7474

Merged

asenac closed this as completed in #7474 Jul 21, 2021

Compute automation moved this from To do to Done Jul 21, 2021

sploiselle pushed a commit to sploiselle/materialize that referenced this issue Dec 13, 2021

sql: ast rewrite for comma joins

9a9a215

Fixes MaterializeInc#6875

sploiselle mentioned this issue Dec 13, 2021

sql: refactor USING constraint planning #9489

Merged

2 tasks

sploiselle pushed a commit to sploiselle/materialize that referenced this issue Dec 14, 2021

sql: ast rewrite for comma joins

c2a4caa

Fixes MaterializeInc#6875

benesch reopened this Dec 15, 2021

Compute automation moved this from Done to In progress Dec 15, 2021

benesch mentioned this issue Dec 16, 2021

sql: correct lateral scoping in the face of outer joins #9623

Merged

2 tasks

benesch closed this as completed in #9623 Dec 17, 2021

Compute automation moved this from In progress to Done Dec 17, 2021

Jan	AUG	Sep
	22
2023	2024	2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: incorrect outer join parsing #6875

sql: incorrect outer join parsing #6875

asenac commented May 27, 2021

frankmcsherry commented May 27, 2021

asenac commented May 27, 2021

asenac commented May 27, 2021

asenac commented May 27, 2021

asenac commented Jun 1, 2021 •

edited

Loading

philip-stoev commented Jun 1, 2021

asenac commented Jun 2, 2021 •

edited

Loading

asenac commented Jun 2, 2021 •

edited

Loading

asenac commented Jun 2, 2021 •

edited

Loading

asenac commented Jun 3, 2021

asenac commented Jun 3, 2021

asenac commented Jun 17, 2021

uce commented Jul 14, 2021

sploiselle commented Dec 14, 2021

benesch commented Dec 15, 2021

sql: incorrect outer join parsing #6875

sql: incorrect outer join parsing #6875

Comments

asenac commented May 27, 2021

What version of Materialize are you using?

How did you install Materialize?

What was the issue?

Is the issue reproducible? If so, please provide reproduction instructions.

Please attach any applicable log files.

frankmcsherry commented May 27, 2021

asenac commented May 27, 2021

asenac commented May 27, 2021

asenac commented May 27, 2021

asenac commented Jun 1, 2021 • edited Loading

philip-stoev commented Jun 1, 2021

asenac commented Jun 2, 2021 • edited Loading

Proposal

asenac commented Jun 2, 2021 • edited Loading

asenac commented Jun 2, 2021 • edited Loading

asenac commented Jun 3, 2021

asenac commented Jun 3, 2021

asenac commented Jun 17, 2021

uce commented Jul 14, 2021

sploiselle commented Dec 14, 2021

benesch commented Dec 15, 2021

asenac commented Jun 1, 2021 •

edited

Loading

asenac commented Jun 2, 2021 •

edited

Loading

asenac commented Jun 2, 2021 •

edited

Loading

asenac commented Jun 2, 2021 •

edited

Loading