Skip to content

Conversation

@fivetran-kwoodbeck
Copy link
Collaborator

@fivetran-kwoodbeck fivetran-kwoodbeck commented Dec 17, 2025

DuckDB has support for left and right shift (<< and >>) up to INT128, but by default it assumes INT32 and needs casting in order to prevent it from throwing errors in various situations. See the Jira tickets for full testing.

Snowflake claims it always returns an INT128 (for INT input). In practice, it scales based on result and ranges from NUMBER(3,0) to NUMBER(38,0) depending on what number is returned. Snowflake also supports BINARY input.

Snowflake does not throw errors when you shift over the max int, DuckDB does.

Documentation:
https://docs.snowflake.com/en/sql-reference/functions/bitshiftleft
https://docs.snowflake.com/en/sql-reference/functions/bitshiftright

@github-actions
Copy link
Contributor

github-actions bot commented Dec 17, 2025

SQLGlot Integration Test Results

Comparing:

  • this branch (sqlglot:feature/transpile-bitshift, sqlglot version: feature/transpile-bitshift)
  • baseline (main, sqlglot version: 28.5.1.dev43)

⚠️ Limited to dialects: duckdb, snowflake

By Dialect

dialect main sqlglot:feature/transpile-bitshift difference links
duckdb -> duckdb 4003/4003 passed (100.0%) 4003/4003 passed (100.0%) No change full result / delta
snowflake -> duckdb 626/1085 passed (57.7%) 626/1085 passed (57.7%) No change full result / delta
snowflake -> snowflake 981/1085 passed (90.4%) 981/1085 passed (90.4%) No change full result / delta

Overall

main: 6173 total, 5610 passed (pass rate: 90.9%), sqlglot version: 28.5.1.dev43

sqlglot:feature/transpile-bitshift: 6173 total, 5610 passed (pass rate: 90.9%), sqlglot version: feature/transpile-bitshift

Difference: No change

Copy link
Collaborator

@georgesittas georgesittas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs testing, but I'm skeptical about the approach. May need to sync in Slack about this.

@fivetran-kwoodbeck
Copy link
Collaborator Author

This needs testing, but I'm skeptical about the approach. May need to sync in Slack about this.

@georgesittas Please see the Jira ticket for all tests that were run prior to the first push. There's about 300 test queries against Snowflake, transpiled, run against DuckDB and results checked against one another.

@georgesittas
Copy link
Collaborator

We need to add tests.

@fivetran-kwoodbeck fivetran-kwoodbeck force-pushed the feature/transpile-bitshift branch from 9f96195 to 21052b5 Compare December 19, 2025 21:30
@fivetran-kwoodbeck fivetran-kwoodbeck force-pushed the feature/transpile-bitshift branch from 21052b5 to 7c4b5e3 Compare December 22, 2025 21:42
@georgesittas georgesittas force-pushed the feature/transpile-bitshift branch from 83f7591 to 526cd22 Compare December 24, 2025 09:09
@georgesittas
Copy link
Collaborator

@fivetran-kwoodbeck @VaggelisD take a look at the refactored PR when you get the chance, took a stab at cleaning it up a bit.

Copy link
Collaborator

@VaggelisD VaggelisD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only got one question, @fivetran-kwoodbeck if the refactored logic & tests check out feel free to merge

Comment on lines 1751 to 1755
self.validate_all(
"SELECT BITSHIFTRIGHT(X'FF', 4)",
write={
"snowflake": "SELECT BITSHIFTRIGHT(255, 4)",
"duckdb": "SELECT CAST(255 AS INT128) >> 4",
},
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generates a BINARY in Snowflake but an INT in DuckDB, this is intentional given that we don't annotate the types (?)

sf> SELECT system$typeof(BITSHIFTRIGHT(X'FF', 4)), BITSHIFTRIGHT(X'FF', 4);
BINARY[LOB] | 0F
-- | --


duckdb> SELECT CAST(255 AS INT128) >> 4;
┌─────────────────────────────┐
│ (CAST(255 AS HUGEINT) >> 4) │
│           int128            │
├─────────────────────────────┤
│             15              │
└─────────────────────────────┘
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we annotate the input w/ snowflake? I think preserving the type requires type inference to run first.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest patch, which also fixes the fundamental HEX transpilation issue, resolves this. I've made HEX resolve to binary by default.

SELECT CAST(CAST(UNHEX('FF') AS BIT) >> 4 AS BLOB);

@fivetran-kwoodbeck
Copy link
Collaborator Author

fivetran-kwoodbeck commented Dec 24, 2025

Hi @VaggelisD The tests don't pass and I'm not sure what's the best way to deal with it (a lot of back and forth). There are 2 issues, both of which can be seen here:

SELECT BITSHIFTLEFT(X'0080'::BINARY, 1);
SELECT BITSHIFTRIGHT(BITSHIFTLEFT(X'0080'::BINARY, 1), 1);

The first is INT to BLOB issue, when transpiled:

SELECT CAST(CAST(CAST(128 AS BLOB) AS BIT) << 1 AS BLOB);

Conversion Error: Unimplemented type for cast (INTEGER -> BLOB)
LINE 1: SELECT CAST(CAST(CAST(128 AS BLOB) AS BIT) << 1 AS BLOB)

I think it needs an UNHEX and cast to BIT (see below). The second, if we fix the hex issue, then has another problem where the first parameter is not seen as BINARY so it assumes INT, which is wrong:

SELECT CAST(CAST(CAST(UNHEX('0080') AS BIT) << 1 AS BLOB) AS INT128) >> 1

Conversion Error: Unimplemented type for cast (BLOB -> HUGEINT) LINE 1: SELECT CAST(CAST(CAST(UNHEX('0080') AS BIT) << 1 AS BLOB) AS INT128... ^

Which I think can be solved with an extra annotate_types called on the first parameter, but I'm not entirely sure.

@fivetran-kwoodbeck fivetran-kwoodbeck force-pushed the feature/transpile-bitshift branch from 526cd22 to bc2170a Compare December 29, 2025 23:05

# `from_hex` has transpiled x'ABCD' (BINARY) to DuckDB's '\xAB\xCD' (BINARY)
# `to_hex` & CASTing transforms it to "ABCD" (BINARY) to match representation
to_hex = exp.cast(self.func("TO_HEX", from_hex), exp.DataType.Type.BLOB)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is invalid, it adds a TO_HEX to the previous FROM_HEX, which turns it into a string that is then cast to BINARY.

"clickhouse": UnsupportedError,
"databricks": "SELECT X'CC'",
"drill": "SELECT 204",
"duckdb": "SELECT CAST(HEX(FROM_HEX('CC')) AS VARBINARY)",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This returns a different result in DuckDB than 'CC'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants