-
Notifications
You must be signed in to change notification settings - Fork 680
feat(api): implement upsert() using MERGE INTO
#11624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
5a27b34 to
21994eb
Compare
|
@cpcloud Requesting review for initial feedback while I try to improve backend support; I assume I could always mark a lot of them |
46b0e4a to
8e47b79
Compare
| query = query.sql(self.dialect) | ||
|
|
||
| if "MERGE" in query: | ||
| query = f"{query};" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm honestly not sure how to better do this; I wasn't able to figure out how to add a semicolon to an expression in SQLGlot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't ideal but I think is fine with me. The "cleaner" way would be to override the _build_upsert_from_table method, but the amount of boilerplate for that feels not worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or actually, is it not possible to just always stick on a ; at the end of _build_upsert_from_table() for every backend? or does that break on some backends?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or actually, is it not possible to just always stick on a ; at the end of _build_upsert_from_table() for every backend?
I don't actually know how to stick a semicolon on the end of a SQLGlot expression. 😅 If _build_upsert_from_table() was handling the conversion to SQL, that would have been simple enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I understand.
huh, is this a bug in the upstream mssql engine? Can you confirm that only merge statements require trailing semicolons, but other sql queries/statements do not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, semicolons are only enforced for certain statements, including MERGE.
There's also a link in that answer to the official documentation stating that, at some point in the future, they will require terminators for every statement, but that clearly hasn't happened in the nearly 10 years since that answer was given.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say either:
- add a comment saying why only MERGE gets a semicolon, in addition linking to that stackoverflow.
- just always append a semicolor
Slight preference for 2 due to the less cognitive overhead for future developers
2e1403d to
b40103e
Compare
25061ad to
d052b43
Compare
d2b5922 to
8b0d6fe
Compare
|
@cpcloud I'm temporarily cherry-picked the changes from #11636 and the updates to With that, all of the functionality to implement |
adc70e7 to
ffc8125
Compare
@cpcloud FYI I've done this and everything is passing! Should be ready to go. |
|
@NickCrews Addressed all your comments (except didn't add an xfail test, but replied to your suggestion). Please feel free to take another look! |
|
Thanks @deepyaman, those tweaks all look great. I came up with several more test cases I'd love to see. Sorry for the continually moving target. |
5f12ee5 to
10be51c
Compare
@NickCrews No worries, thanks for the revised review! I believe I've largely addressed your updated feedback. |
a2458ce to
613dce6
Compare
|
|
||
| @pytest.fixture | ||
| def test_employee_data_3(): | ||
| import pandas as pd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is already imported at the top level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description of changes
Implement
Backend.upsert()usingsqlglot.expressions.merge()under the hood. Upsert support is very important, especially for data engineering use cases.Starting with a most basic implementation, including only supporting one join column. I think this could be expanded to support a list without much effort.
MERGE INTOsupport is limited. DuckDB only added support forMERGEstatements earlier today in 1.4.0, and many other backends don't support it. However, it seems like the more standard/correct approach for supporting upserts, and it doesn't require merge keys defined ahead of time on tables.Backends that work:
(currently using a hack to work around "AS" getting added to MERGE statement);onto the end ofeverystatement 😅)Should work, need help to test:
Backends that don't work:
MERGEstatement is only supported for Iceberg tables.")MERGE INTOis transactional and is supported only for Apache Iceberg tables in Athena engine version 3.")Issues closed
Backend.upsert#5391