-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-841405 Fix df copy and enable basic diamond shaped joins for simplifier #1003
SNOW-841405 Fix df copy and enable basic diamond shaped joins for simplifier #1003
Conversation
assert df1.a._expression.expr_id != df2.a._expression.expr_id | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we testing expr_id
s are not the same. Does it affect dataframes in weird ways if they are same? Can we test that those subsequent side effects of expr_id
s being the same are tested instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If they are the same we won't be able to perform the join.
for v in resolved_children.values(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % some curious questions. Please also address Afroz's comment on test :)
self.df_aliased_col_name_to_real_col_name: DefaultDict[ | ||
str, Dict[str, str] | ||
] = defaultdict(dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious what triggered this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is bug actually, I intended it to be what the type hints suggests: a defaultdict of dictionaries. This PR fixes dataframe.copy to work for simplifier path, so this is caught.
Could you use |
Performance test resultsMain
fix-df-copy-and-diamond-shaped-joins-for-simplifier
|
a7ea86b
to
40cf9e4
Compare
Please answer these questions before submitting your pull requests. Thanks!
What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Also fixes SNOW-831813: Unsupported subquery type cannot be evaluated of leftsemi join #875 and SNOW-837419: Self-join on Snowpark 1.4.0 causes an
invalid_identifier
error #886Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
Fixes
DataFrame.copy
for simplifier path by make sure attributes are copied from the subquery so thatNamedExpression.expr_id
is regenerated. As a result,df1.filter(col("a") > 1).a
has a different expression id thandf1.a
, which enables basic diamond shaped joins like