-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Local Testing] SNOW-783554 Support selecting window expressions #1049
[Local Testing] SNOW-783554 Support selecting window expressions #1049
Conversation
sfc-gh-stan
commented
Sep 13, 2023
•
edited
Loading
edited
- Make non-rank related functions with unspecified window frame use cumulative window by default
- Support aggregating window expressions
- Support more rank related functions/Refactor to allow patching for rank related functions (This will be addressed in a separate PR).
be640d9
to
d07b157
Compare
@@ -137,7 +139,7 @@ def mock_avg(column: ColumnEmulator) -> ColumnEmulator: | |||
cnt += 1 | |||
# round to 5 according to snowflake spec | |||
ret = ( | |||
ColumnEmulator(data=[round((ret / cnt), 5)]) | |||
ColumnEmulator(data=[round((ret / cnt), 3)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are other places in this module where we round to 5 (I did this previously)
so do we also need to update other mock implementations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I will address these altogether in a separate PR.
d07b157
to
40a2e7e
Compare
I tried this
Two problems:
|
4934593
to
56b2034
Compare
56b2034
to
71ca36c
Compare
@@ -285,7 +308,7 @@ def analyze( | |||
self.session._conn._telemetry_client.send_function_usage_telemetry( | |||
expr.api_call_source, TelemetryField.FUNC_CAT_USAGE.value | |||
) | |||
func_name = expr.name.upper() if parse_local_name else expr.name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change always converts function names to uppercase. cc: @sfc-gh-yixie
@@ -270,6 +270,8 @@ def add_date_and_number( | |||
|
|||
|
|||
class ColumnEmulator(pd.Series): | |||
_metadata = ["sf_type"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this to preserve sf_type
after index operations like reset_index
or sort_index
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this is used anyhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed by pandas internally to preserve added property in subclasses: https://pandas.pydata.org/docs/development/extending.html#define-original-properties
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we add a comment here?
@@ -251,20 +265,30 @@ def test_range_between_should_accept_at_most_one_order_by_expression_when_unboun | |||
df.select( | |||
min_("key").over(window.range_between(Window.unboundedPreceding, 1)) | |||
).collect() | |||
assert "Cumulative window frame unsupported for function MIN" in str(ex_info) | |||
if not local_testing_mode: | |||
assert "Cumulative window frame unsupported for function MIN" in str( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message doesn't make sense. This error is caused by window.range_between(Window.unboundedPreceding, 1)
, you cannot specify range between for non-cumulative window frames. As an evidence, changing the expression to window.range_between(Window.unboundedPreceding, 0)
runs regardless of the number of order expressions.
@@ -283,19 +307,28 @@ def test_range_between_should_accept_numeric_values_only_when_bounded(session): | |||
df.select( | |||
min_("value").over(window.range_between(Window.unboundedPreceding, 1)) | |||
).collect() | |||
assert "Cumulative window frame unsupported for function MIN" in str(ex_info) | |||
if not local_testing_mode: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue as above, the error message is confusing and the actual failure is not caused by numeric values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should rewrite these two tests for both live connection and local testing.
Fixed. |
) # dtype=object prevents implicit converting None to Nan | ||
res_col.index = res_index | ||
return res_col.sort_index() | ||
elif isinstance(window_function, FirstValue): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it feels that FirstValue and LastValue calculation are sharing lots of logic
except for the w.iloc[0]
vs w.iloc[len(w) - 1]
and for cur_idx in range(len(w)):
vs for cur_idx in range(len(w) - 1, -1, -1):
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very true. I am planning to refactor this part in a separate PR so all window function definitions can be found in mock/functions.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's create a JIRA or log it in an existing jira so that we don't forget?